Issues of using sort and comm
Per the comm
manual, "Before `comm' can be used, the input files must be sorted using the collating sequence specified by the `LC_COLLATE' locale."
And the sort
manual: "Unless otherwise specified, all comparisons use the character collating sequence specified by the `LC_COLLATE' locale.
Therefore, and a quick test confirms, the LC_COLLATE
order comm
expects is provided by the sort
's default order, dictionary sort.
sort
can sort files in a variety of manners:
-d
: Dictionary order - ignores anything but whitespace and alphanumerics.-g
: General numeric - alpha, then negative numbers, then positive.-h
: Human-readable - negative, alpha, positive.n < nk = nK < nM < nG
-n
: Numeric - negative, alpha, positive.k
,M
,G
, etc. are not special.-V
: Version - positive, caps, lower, negative.1 < 1.2 < 1.10
-f
: Case-insensitive.-R
: Random - shuffle the input.-r
: Reverse - usually used with one ofdghnV
There are other options, of course, but these are the ones you're likely to see or need.
Your test shows that the default sort order is probably -d
, dictionary order.
d | g | h | n | V
------+-------+-------+-------+-------
1 | a | -1G | -10 | 1
-1 | A | -1k | -5 | 1G
10 | z | -10 | -1 | 1g
-10 | Z | -5 | -1g | 1k
1.10| -10 | -1 | -1G | 1.2
1.2 | -5 | -1g | -1k | 1.10
1g | -1 | a | a | 5
1G | -1g | A | A | 10
-1g | -1G | z | z | A
-1G | -1k | Z | Z | Z
1k | 1 | 1 | 1 | a
-1k | 1g | 1g | 1g | z
5 | 1G | 1.10 | 1G | -1
-5 | 1k | 1.2 | 1k | -1G
a | 1.10 | 5 | 1.10 | -1g
A | 1.2 | 10 | 1.2 | -1k
z | 5 | 1k | 5 | -5
Z | 10 | 1G | 10 | -10