What .NET StringComparer is equivalent SQL's Latin1_General_CI_AS

I've recently faced with the same problem: I need an IEqualityComparer<string> that behaves in SQL-like style. I've tried CollationInfo and its EqualityComparer. If your DB is always _AS (accent sensitive) then your solution will work, but in case if you change the collation that is AI or WI or whatever "insensitive" else the hashing will break.
Why? If you decompile Microsoft.SqlServer.Management.SqlParser.dll and look inside you'll find out that CollationInfo internally uses CultureAwareComparer.GetHashCode (it's internal class of mscorlib.dll) and finally it does the following:

public override int GetHashCode(string obj)
{
  if (obj == null)
    throw new ArgumentNullException("obj");
  CompareOptions options = CompareOptions.None;
  if (this._ignoreCase)
    options |= CompareOptions.IgnoreCase;
  return this._compareInfo.GetHashCodeOfString(obj, options);
}

As you can see it can produce the same hashcode for "aa" and "AA", but not for "äå" and "aa" (which are the same, if you ignore diacritics (AI) in majority of cultures, so they should have the same hashcode). I don't know why the .NET API is limited by this, but you should understand where the problem can come from. To get the same hashcode for strings with diacritics you can do the following: create implementation of IEqualityComparer<T> implementing the GetHashCode that will call appropriate CompareInfo's object's GetHashCodeOfString via reflection because this method is internal and can't be used directly. But calling it directly with correct CompareOptions will produce the desired result: See this example:

    static void Main(string[] args)
    {
        const string outputPath = "output.txt";
        const string latin1GeneralCiAiKsWs = "Latin1_General_100_CI_AI_KS_WS";
        using (FileStream fileStream = File.Open(outputPath, FileMode.Create, FileAccess.Write))
        {
            using (var streamWriter = new StreamWriter(fileStream, Encoding.UTF8))
            {
                string[] strings = { "aa", "AA", "äå", "ÄÅ" };
                CompareInfo compareInfo = CultureInfo.GetCultureInfo(1033).CompareInfo;
                MethodInfo GetHashCodeOfString = compareInfo.GetType()
                    .GetMethod("GetHashCodeOfString",
                    BindingFlags.Instance | BindingFlags.NonPublic,
                    null,
                    new[] { typeof(string), typeof(CompareOptions), typeof(bool), typeof(long) },
                    null);

                Func<string, int> correctHackGetHashCode = s => (int)GetHashCodeOfString.Invoke(compareInfo,
                    new object[] { s, CompareOptions.IgnoreCase | CompareOptions.IgnoreNonSpace, false, 0L });

                Func<string, int> incorrectCollationInfoGetHashCode =
                    s => CollationInfo.GetCollationInfo(latin1GeneralCiAiKsWs).EqualityComparer.GetHashCode(s);

                PrintHashCodes(latin1GeneralCiAiKsWs, incorrectCollationInfoGetHashCode, streamWriter, strings);
                PrintHashCodes("----", correctHackGetHashCode, streamWriter, strings);
            }
        }
        Process.Start(outputPath);
    }
    private static void PrintHashCodes(string collation, Func<string, int> getHashCode, TextWriter writer, params string[] strings)
    {
        writer.WriteLine(Environment.NewLine + "Used collation: {0}", collation + Environment.NewLine);
        foreach (string s in strings)
        {
            WriteStringHashcode(writer, s, getHashCode(s));
        }
    }

The output is:

Used collation: Latin1_General_100_CI_AI_KS_WS
aa, hashcode: 2053722942
AA, hashcode: 2053722942
äå, hashcode: -266555795
ÄÅ, hashcode: -266555795

Used collation: ----
aa, hashcode: 2053722942
AA, hashcode: 2053722942
äå, hashcode: 2053722942
ÄÅ, hashcode: 2053722942

I know it looks like the hack, but after inspecting decompiled .NET code I'm not sure if there any other option in case the generic functionality is needed. So be sure that you'll not fall into trap using this not fully correct API.
UPDATE:
I've also created the gist with potential implementation of "SQL-like comparer" using CollationInfo. Also there should be paid enough attention where to search for "string pitfalls" in your code base, so if the string comparison, hashcode, equality should be changed to "SQL collation-like" those places are 100% will be broken, so you'll have to find out and inspect all the places that can be broken.
UPDATE #2:
There is better and cleaner way to make GetHashCode() treat CompareOptions. There is the class SortKey that works correctly with CompareOptions and it can be retrieved using

CompareInfo.GetSortKey(yourString, yourCompareOptions).GetHashCode()

Here is the link to .NET source code and implementation.

UPDATE #3:
If you're on .NET Framework 4.7.1+ you should use new GlobalizationExtensions class as proposed by this recent answer.


Take a look at the CollationInfo class. It is located in an assembly called Microsoft.SqlServer.Management.SqlParser.dll although I am not totally sure where to get this. There is a static list of Collations (names) and a static method GetCollationInfo (by name).

Each CollationInfo has a Comparer. It is not exactly the same as a StringComparer but has similar functionality.

EDIT: Microsoft.SqlServer.Management.SqlParser.dll is a part of the Shared Management Objects (SMO) package. This feature can be downloaded for SQL Server 2008 R2 here:

http://www.microsoft.com/download/en/details.aspx?id=16978#SMO

EDIT: CollationInfo does have a property named EqualityComparer which is an IEqualityComparer<string>.


SQL Server's Server.GetStringComparer may be of some use.