Default ordering in C# vs. F#
Different libraries make different choices of the default comparison operation on strings. F# is strict defaulting to case sensitivity, while LINQ to Objects is case insensitive.
Both List.sortWith
and Array.sortWith
allow the comparison to be specified. As does an overload of Enumerable.OrderBy
.
However the Seq
module does not appear to have an equivalent (and one isn't being added in 4.6).
For the specific questions:
Is there an underlying reason for the differences in ordering logic?
Both orderings are valid. In English cases insensitivity seems more natural, because that's what we're used to. But this does not make it more correct.
What is the recommended way to overcome this "problem" in my situation?
Be explicit about the kind of comparison.
Is this phenomenon specific to strings, or does it apply to other .NET types too?
char
will also be affected. And any other type where there is more than one possible ordering (eg. a People
type: you could order by name or date of birth depending on the specific requirements).
See section 8.15.6 of the language spec.
Strings, arrays, and native integers have special comparison semantics, everything else just goes to IComparable
if that's implemented (modulo various optimizations that yield the same result).
In particular, F# strings use ordinal comparison by default, in contrast to most of .NET which uses culture-aware comparison by default.
This is obviously a confusing incompatibility between F# and other .NET languages, however it does have some benefits:
- OCAML compat
- String and char comparisons are consistent
- C#
Comparer<string>.Default.Compare("a", "A") // -1
- C#
Comparer<char>.Default.Compare('a', 'A') // 32
- F#
compare "a" "A" // 1
- F#
compare 'a' 'A' // 32
- C#
Edit:
Note that it's misleading (though not incorrect) to state that "F# uses case-sensitive string comparison". F# uses ordinal comparison, which is stricter than just case-sensitive.
// case-sensitive comparison
StringComparer.InvariantCulture.Compare("[", "A") // -1
StringComparer.InvariantCulture.Compare("[", "a") // -1
// ordinal comparison
// (recall, '[' lands between upper- and lower-case chars in the ASCII table)
compare "[" "A" // 26
compare "[" "a" // -6
This has nothing to do with C# vs F#, or even IComparable
, but is just due to the different sort implementations in the libraries.
The TL;DR; version is that sorting strings can give different results:
"tv" < "TV" // false
"tv".CompareTo("TV") // -1 => implies "tv" *is* smaller than "TV"
Or even clearer:
"a" < "A" // false
"a".CompareTo("A") // -1 => implies "a" is smaller than "A"
This is because CompareTo
uses the current culture (see MSDN).
We can see how this plays out in practice with some different examples.
If we use the standard F# sort we get the uppercase-first result:
let strings = [ "UV"; "Uv"; "uV"; "uv"; "Tv"; "TV"; "tv"; "tV" ]
strings |> List.sort
// ["TV"; "Tv"; "UV"; "Uv"; "tV"; "tv"; "uV"; "uv"]
Even if we cast to IComparable
we get the same result:
strings |> Seq.cast<IComparable> |> Seq.sort |> Seq.toList
// ["TV"; "Tv"; "UV"; "Uv"; "tV"; "tv"; "uV"; "uv"]
On the other hand if we use Linq from F#, we get the same result as the C# code:
open System.Linq
strings.OrderBy(fun s -> s).ToArray()
// [|"tv"; "tV"; "Tv"; "TV"; "uv"; "uV"; "Uv"; "UV"|]
According to MSDN,
the OrderBy
method "compares keys by using the default comparer Default."
The F# libraries do not use Comparer
by default, but we can use sortWith
:
open System.Collections.Generic
let comparer = Comparer<string>.Default
Now when we do this sort, we get the same result as the LINQ OrderBy
:
strings |> List.sortWith (fun x y -> comparer.Compare(x,y))
// ["tv"; "tV"; "Tv"; "TV"; "uv"; "uV"; "Uv"; "UV"]
Alternatively, we can use the built-in CompareTo
function, which gives the same result:
strings |> List.sortWith (fun x y -> x.CompareTo(y))
// ["tv"; "tV"; "Tv"; "TV"; "uv"; "uV"; "Uv"; "UV"]
Moral of the tale: If you care about sorting, always specify the specific comparison to use!