Unexpected result {".a", "co", ".m"} from Sort[{".m", ".a", "co"}]
According to The Chicago Manual of Style, para. 18.57/18.58, punctation marks are ignored.
18.57
The letter-by-letter system. In the letter-by-letter system, alphabetizing continues up to the first parenthesis or comma; it then starts again after the punctuation point. Word spaces and all other punctuation marks are ignored. Both open and hyphenated compounds such as New York or self-pity are treated as single words. The order of precedence is one word, word followed by a parenthesis, and word followed by a comma, number, or letters. The index to this manual, in accordance with Chicago’s traditional preference, is arranged letter by letter.
I won't say it's a definitive answer, but it supports Mathematica's behavior to a certain extent.
Sort orders strings as in a dictionary, with uppercase versions of letters coming after lowercase ones.Sort places ordinary letters first, followed in order by script, Gothic, double - struck, Greek, and Hebrew.Mathematical operators appear in order of decreasing precedence.
Sort[list, p] applies the function p to pairs of elements in list to determine whether they are in order.The default function p is OrderedQ[{#1, #2}] &.
OrderedQ[h[Subscript[e, 1], Subscript[e, 2], [Ellipsis]]] gives True if the Subscript[e, i] are in canonical order, and False otherwise.
Some Test
Just one Sort
without function
Let' s see the CharacterCode of sampleList1
(*Input 1 ==< *) sampleList1 = {compute, Tes, ., , Etz, .a, .m, a, z, T, .T, wha, {}; ToCharacterCode[sampleList1]
(* Output 1 ==> {{99,111,109,112,117,116,101},{84,101,115},{46},{},{69,116,122},{46,97},{46,109},{97},{122 },{84},{46,84},{119,104,97},{123}} *)
(*Input 2 ==< *) Sort[sampleList1]
(* Output 2 ==> {,{,.,.a,a,compute,Etz,.m,.T,T,Tes,wha,z} *)
(*Input 3 ==< *) ToCharacterCode[%]
(* Output 3 ==> {{},{123},{46},{46,97},{97},{99,111,109,112,117,116,101},{69,116,122},{46,109},{46,84},{84 },{84,101,115},{119,104,97},{122}} *)
(*Input 4 ==< *) Total /@ %
(* Output 4 ==> {0,123,46,143,97,765,307,155,130,84,300,320,122} *)
My thoughts
I do not know whether the default order of Sort
is related with CharacterCode
.
I guess Mathematica treats some characters orderless and skips some characters in sorting strings with Sort
.
I think some characters are treated as trivial elements, and are put before the alphabet.
Though there maybe one order table for all characters whichI
don't know.
My opinion
Firstly, you should show what's the correct answer for a variety lists, as @JonathanShock mentioned in the comment.
Otherwise, each time you come up one result differ from Python, then ask why Mathematica does not work like Python.
I think that is ....
The important thing is how to get the results as expected.
Sort
with one function
(*Input 5 ==< *) Sort[sampleList1, ToCharacterCode[#1] & ]
(* Output 5 ==> {compute,Tes,.,,Etz,.a,.m,a,z,T,.T,wha,{} *)
(*Input 6 ==< *) ToCharacterCode[%]
(* Output 6 ==> {{99,111,109,112,117,116,101},{84,101,115},{46},{},{69,116,122},{46,97},{46,109},{97},{122 },{84},{46,84},{119,104,97},{123}} *)
(*Input 7 ==< *) Total /@ %
(* Output 7 ==> {765,300,46,0,307,143,155,97,122,84,130,320,123} *)
(*Input 8 ==< *) First /@ %%
(* Output 8 ==> {99,84,46,First[{}],69,46,46,97,122,84,46,119,123} *)
Still does not work well, of course, it maybe wrong, forTotal
is not the correct criteria of order.
The following is what you expected? note theStringLength
of characters.
(*Input 9 ==< *) FromCharacterCode[Sort[ToCharacterCode[sampleList1]]]
(* Output 9 ==> {,.,T,a,z,{,.T,.a,.m,Etz,Tes,wha,compute} *)
(*Input 10 ==< *) Sort[ToCharacterCode[sampleList1]]
(* Output 10 ==> {{},{46},{84},{97},{122},{123},{46,84},{46,97},{46,109},{69,116,122},{84,101,115},{119,104 ,97},{99,111,109,112,117,116,101}} *)
(*Input 11 ==< *) First /@ %
(* Output 11 ==> {First[{}],46,84,97,122,123,46,46,46,69,84,119,99} *)
update
Note: totalList
here is neither ascending nor descending.
(*Input 12 ==< *) totalList = Total /@ %%
(* Output 12 ==> {0,46,84,97,122,123,130,143,155,307,300,320,765} *)
(*Input 13 ==< *) Transpose[{totalList, Sort[totalList]}]
(* Output 13 ==> {{0,0},{46,46},{84,84},{97,97},{122,122},{123,123},{130,130},{143,143},{155,155},{307,300} ,{300,307},{320,320},{765,765}} *)
The above is the same to that use SortBy
(*Input 14 ==< *) SortBy[sampleList1, ToCharacterCode]
(* Output 14 ==> {,.,T,a,z,{,.T,.a,.m,Etz,Tes,wha,compute} *)
update
As@MichaelE2 mentioned in the comment,
Note that SortBy[sampleList1, ToCharacterCode]
effectively orders them by length first. - Michael E2
one method in Rojo's comment
(*Input 15 ==< *) yourSort = Max[StringLength[#1]] /. len_ :> SortBy[#1, PadRight[ToCharacterCode[#1], len] & ] & ;
My conclusion
So the conclusion maybe the canonical order in Mathematica is different from that in Python.
Since there is perhaps an implicit question of how to get a sort more along the lines you expect, as I proposed here you might use:
asciisort = #[[Ordering @ PadRight @ ToCharacterCode @ #]] &;
asciisort @ {".m", ".ast", "co"}
{".ast", ".m", "co"}
Or with the default character-wise order:
charsort = #[[Ordering @ PadRight @ Characters @ #]] &;
charsort @ {".m", ".ast", "co"}
{".ast", ".m", "co"}
If you are comfortable with shorter strings being ordered first you can also use:
SortBy[{".ast", ".m", "co"}, Characters]
{".m", "co", ".ast"}