Unexpected result {".a", "co", ".m"} from Sort[{".m", ".a", "co"}]

According to The Chicago Manual of Style, para. 18.57/18.58, punctation marks are ignored.

18.57

The letter-by-letter system. In the letter-by-letter system, alphabetizing continues up to the first parenthesis or comma; it then starts again after the punctuation point. Word spaces and all other punctuation marks are ignored. Both open and hyphenated compounds such as New York or self-pity are treated as single words. The order of precedence is one word, word followed by a parenthesis, and word followed by a comma, number, or letters. The index to this manual, in accordance with Chicago’s traditional preference, is arranged letter by letter.

I won't say it's a definitive answer, but it supports Mathematica's behavior to a certain extent.


Sort orders strings as in a dictionary, with uppercase versions of letters coming after lowercase ones.Sort places ordinary letters first, followed in order by script, Gothic, double - struck, Greek, and Hebrew.Mathematical operators appear in order of decreasing precedence.

Sort[list, p] applies the function p to pairs of elements in list to determine whether they are in order.The default function p is OrderedQ[{#1, #2}] &.

OrderedQ[h[Subscript[e, 1], Subscript[e, 2], [Ellipsis]]] gives True if the Subscript[e, i] are in canonical order, and False otherwise.

Some Test


Just one Sort without function


Let' s see the CharacterCode of sampleList1

  1. (*Input 1 ==< *)
    sampleList1 = {compute, Tes, ., , Etz, .a, .m, a, z, T, .T, wha, {}; 
    ToCharacterCode[sampleList1]
    
  2. (*
    Output 1 ==>
    {{99,111,109,112,117,116,101},{84,101,115},{46},{},{69,116,122},{46,97},{46,109},{97},{122  
    },{84},{46,84},{119,104,97},{123}}
    *)
    
  3. (*Input 2 ==< *)
    Sort[sampleList1]
    
  4. (*
    Output 2 ==>
    {,{,.,.a,a,compute,Etz,.m,.T,T,Tes,wha,z}
    *)
    
  5. (*Input 3 ==< *)
    ToCharacterCode[%]
    
  6. (*
    Output 3 ==>
    {{},{123},{46},{46,97},{97},{99,111,109,112,117,116,101},{69,116,122},{46,109},{46,84},{84  
    },{84,101,115},{119,104,97},{122}}
    *)
    
  7. (*Input 4 ==< *)
    Total /@ %
    
  8. (*
    Output 4 ==>
    {0,123,46,143,97,765,307,155,130,84,300,320,122}
    *)
    

My thoughts


I do not know whether the default order of Sort is related with CharacterCode.

I guess Mathematica treats some characters orderless and skips some characters in sorting strings with Sort.

I think some characters are treated as trivial elements, and are put before the alphabet.

Though there maybe one order table for all characters whichIdon't know.

My opinion


Firstly, you should show what's the correct answer for a variety lists, as @JonathanShock mentioned in the comment.

Otherwise, each time you come up one result differ from Python, then ask why Mathematica does not work like Python.

I think that is ....

The important thing is how to get the results as expected.

Sort with one function


  1. (*Input 5 ==< *)
    Sort[sampleList1, ToCharacterCode[#1] & ]
    
  2. (*
    Output 5 ==>
    {compute,Tes,.,,Etz,.a,.m,a,z,T,.T,wha,{}
    *)
    
  3. (*Input 6 ==< *)
    ToCharacterCode[%]
    
  4. (*
    Output 6 ==>
    {{99,111,109,112,117,116,101},{84,101,115},{46},{},{69,116,122},{46,97},{46,109},{97},{122  
    },{84},{46,84},{119,104,97},{123}}
    *)
    
  5. (*Input 7 ==< *)
    Total /@ %
    
  6. (*
    Output 7 ==>
    {765,300,46,0,307,143,155,97,122,84,130,320,123}
    *)
    
  7. (*Input 8 ==< *)
    First /@ %%
    
  8. (*
    Output 8 ==>
    {99,84,46,First[{}],69,46,46,97,122,84,46,119,123}
    *)
    

Still does not work well, of course, it maybe wrong, forTotalis not the correct criteria of order.

The following is what you expected? note theStringLengthof characters.

  1. (*Input 9 ==< *)
    FromCharacterCode[Sort[ToCharacterCode[sampleList1]]]
    
  2. (*
    Output 9 ==>
    {,.,T,a,z,{,.T,.a,.m,Etz,Tes,wha,compute}
    *)
    
  3. (*Input 10 ==< *)
    Sort[ToCharacterCode[sampleList1]]
    
  4. (*
    Output 10 ==>
    {{},{46},{84},{97},{122},{123},{46,84},{46,97},{46,109},{69,116,122},{84,101,115},{119,104  
    ,97},{99,111,109,112,117,116,101}}
    *)
    
  5. (*Input 11 ==< *)
    First /@ %
    
  6. (*
    Output 11 ==>
    {First[{}],46,84,97,122,123,46,46,46,69,84,119,99}
    *)
    

update

Note: totalList here is neither ascending nor descending.

  1. (*Input 12 ==< *)
    totalList = Total /@ %%
    
  2. (*
    Output 12 ==>
    {0,46,84,97,122,123,130,143,155,307,300,320,765}
    *)
    
  3. (*Input 13 ==< *)
    Transpose[{totalList, Sort[totalList]}]
    
  4. (*
    Output 13 ==>
    {{0,0},{46,46},{84,84},{97,97},{122,122},{123,123},{130,130},{143,143},{155,155},{307,300}  
    ,{300,307},{320,320},{765,765}}
    *)
    

The above is the same to that use SortBy

  1. (*Input 14 ==< *)
    SortBy[sampleList1, ToCharacterCode]
    
  2. (*
    Output 14 ==>
    {,.,T,a,z,{,.T,.a,.m,Etz,Tes,wha,compute}
    *)
    

update

As@MichaelE2 mentioned in the comment,

Note that SortBy[sampleList1, ToCharacterCode] effectively orders them by length first. - Michael E2

one method in Rojo's comment

  1. (*Input 15 ==< *)
    yourSort = Max[StringLength[#1]] /. len_ :> SortBy[#1, PadRight[ToCharacterCode[#1], len] & ] & ; 
    

My conclusion


So the conclusion maybe the canonical order in Mathematica is different from that in Python.


Since there is perhaps an implicit question of how to get a sort more along the lines you expect, as I proposed here you might use:

asciisort = #[[Ordering @ PadRight @ ToCharacterCode @ #]] &;

asciisort @ {".m", ".ast", "co"}
{".ast", ".m", "co"}

Or with the default character-wise order:

charsort = #[[Ordering @ PadRight @ Characters @ #]] &;

charsort @ {".m", ".ast", "co"}
{".ast", ".m", "co"}

If you are comfortable with shorter strings being ordered first you can also use:

SortBy[{".ast", ".m", "co"}, Characters]
{".m", "co", ".ast"}