How can you remove duplicate characters in a string?

It will do the job

string removedupes(string s)
{
    string newString = string.Empty;
    List<char> found = new List<char>();
    foreach(char c in s)
    {
       if(found.Contains(c))
          continue;

       newString+=c.ToString();
       found.Add(c);
    }
    return newString;
}

I should note this is criminally inefficient.

I think I was delirious on first revision.


A Linq approach:

public static string RemoveDuplicates(string input)
{
    return new string(input.ToCharArray().Distinct().ToArray());
}

For arbitrary length strings of byte-sized characters (not for wide characters or other encodings), I would use a lookup table, one bit per character (32 bytes for a 256-bit table). Loop through your string, only output characters that don't have their bits turned on, then turn the bit on for that character.

string removedupes(string s)
{
    string t;
    byte[] found = new byte[256];
    foreach(char c in s)
    {
        if(!found[c]) {
            t.Append(c);
            found[c]=1;
        }
    }
    return t;
}

I am not good with C#, so I don't know the right way to use a bitfield instead of a byte array.

If you know that your strings are going to be very short, then other approaches would offer better memory usage and/or speed.

Tags:

C#