Transform title into dashed URL-friendly string

How about this:

string FriendlyURLTitle(string pTitle)
{
    pTitle = pTitle.Replace(" ", "-");
    pTitle = HttpUtility.UrlEncode(pTitle);
    return Regex.Replace(pTitle, "\%[0-9A-Fa-f]{2}", "");
}

Most "sluggifiers" (methods for converting to friendly-url type names) tend to do the following:

  1. Strip everything except whitespace, dashes, underscores, and alphanumerics.
  2. (Optional) Remove "common words" (the, a, an, of, et cetera).
  3. Replace spaces and underscores with dashes.
  4. (Optional) Convert to lowercase.

As far as I know, StackOverflow's sluggifier does #1, #3, and #4, but not #2.


I would be doing:

string url = title;
url = Regex.Replace(url, @"^\W+|\W+$", "");
url = Regex.Replace(url, @"'\"", "");
url = Regex.Replace(url, @"_", "-");
url = Regex.Replace(url, @"\W+", "-");

Basically what this is doing is it:

  • strips non-word characters from the beginning and end of the title;
  • removes single and double quotes (mainly to get rid of apostrophes in the middle of words);
  • replaces underscores with hyphens (underscores are technically a word character along with digits and letters); and
  • replaces all groups of non-word characters with a single hyphen.

Rather than looking for things to replace, the list of unreserved chars is so short, it'll make for a nice clear regex.

return Regex.Replace(value, @"[^A-Za-z0-9_\.~]+", "-");

(Note that I didn't include the dash in the list of allowed chars; that's so it gets gobbled up by the "1 or more" operator [+] so that multiple dashes (in the original or generated or a combination) are collapsed, as per Dominic Rodger's excellent point.)

You may also want to remove common words ("the", "an", "a", etc.), although doing so can slightly change the meaning of a sentence. Probably want to remove any trailing dashes and periods as well.

Also strongly recommend you do what SO and others do, and include a unique identifier other than the title, and then only use that unique ID when processing the URL. So http://example.com/articles/1234567/is-the-pop-catholic (note the missing 'e') and http://example.com/articles/1234567/is-the-pope-catholic resolve to the same resource.

Tags:

C#

Replace