.NET library for text algorithms?

I managed to find implementations of most algorithms i need using combination of WikiPedia + Google Code search.

http://en.wikipedia.org/wiki/Category:Algorithms_on_strings
http://www.google.com/codesearch

Though it's strange that no one has created project on this subject, where interested people could collaborate on this.


You may be interested in checking out the google-diff-match-patch library on Google Code. They have an implementation of Myer's diff algorithm and it claims to also implement a Bitap algorithm "at the heart".

It has the C# source that you're looking for as well as implementations in Java, C++, Lua & Python. Although I don't have the best understanding of how to use Bitap in practice (there are demos in the Google Code project) I think you'll be most interested in the match functions starting around line 1476 of the current version.

UPDATE:

A little digging found an implementation of Levenshtein in C# on CodeProject.

Also, this C# class file contains an implementation of Levenshtein on SourceForge. The implementation is part of the Corsis (aka Tenka Text) project. Author claims that the YetiLevenshtein method (around line 741) is 2x to 10x faster than the implementation used in the CodeProject version of the algorithm referenced above.

UPDATE #2:

I just discovered the wikibook Algorithm implementation with it's C# version of Levenshtein Distance and had to include it because it looks pretty straight and to the point. This wikibook looks like a great reference to keep on hand in general.

Levenshtein Distance in C# (courtesy of Wikibooks)

    private Int32 levenshtein(String a, String b)
    {

        if (string.IsNullOrEmpty(a))
        {
            if (!string.IsNullOrEmpty(b))
            {
                return b.Length;
            }
            return 0;
        }

        if (string.IsNullOrEmpty(b))
        {
            if (!string.IsNullOrEmpty(a))
            {
                return a.Length;
            }
            return 0;
        }

        Int32 cost;
        Int32[,] d = new int[a.Length + 1, b.Length + 1];
        Int32 min1;
        Int32 min2;
        Int32 min3;

        for (Int32 i = 0; i <= d.GetUpperBound(0); i += 1)
        {
            d[i, 0] = i;
        }

        for (Int32 i = 0; i <= d.GetUpperBound(1); i += 1)
        {
            d[0, i] = i;
        }

        for (Int32 i = 1; i <= d.GetUpperBound(0); i += 1)
        {
            for (Int32 j = 1; j <= d.GetUpperBound(1); j += 1)
            {
                cost = Convert.ToInt32(!(a[i-1] == b[j - 1]));

                min1 = d[i - 1, j] + 1;
                min2 = d[i, j - 1] + 1;
                min3 = d[i - 1, j - 1] + cost;
                d[i, j] = Math.Min(Math.Min(min1, min2), min3);
            }
        }

        return d[d.GetUpperBound(0), d.GetUpperBound(1)];

    }