Transform title into dashed URL-friendly string
How about this:
string FriendlyURLTitle(string pTitle)
{
pTitle = pTitle.Replace(" ", "-");
pTitle = HttpUtility.UrlEncode(pTitle);
return Regex.Replace(pTitle, "\%[0-9A-Fa-f]{2}", "");
}
Most "sluggifiers" (methods for converting to friendly-url type names) tend to do the following:
- Strip everything except whitespace, dashes, underscores, and alphanumerics.
- (Optional) Remove "common words" (the, a, an, of, et cetera).
- Replace spaces and underscores with dashes.
- (Optional) Convert to lowercase.
As far as I know, StackOverflow's sluggifier does #1, #3, and #4, but not #2.
I would be doing:
string url = title;
url = Regex.Replace(url, @"^\W+|\W+$", "");
url = Regex.Replace(url, @"'\"", "");
url = Regex.Replace(url, @"_", "-");
url = Regex.Replace(url, @"\W+", "-");
Basically what this is doing is it:
- strips non-word characters from the beginning and end of the title;
- removes single and double quotes (mainly to get rid of apostrophes in the middle of words);
- replaces underscores with hyphens (underscores are technically a word character along with digits and letters); and
- replaces all groups of non-word characters with a single hyphen.
Rather than looking for things to replace, the list of unreserved chars is so short, it'll make for a nice clear regex.
return Regex.Replace(value, @"[^A-Za-z0-9_\.~]+", "-");
(Note that I didn't include the dash in the list of allowed chars; that's so it gets gobbled up by the "1 or more" operator [+
] so that multiple dashes (in the original or generated or a combination) are collapsed, as per Dominic Rodger's excellent point.)
You may also want to remove common words ("the", "an", "a", etc.), although doing so can slightly change the meaning of a sentence. Probably want to remove any trailing dashes and periods as well.
Also strongly recommend you do what SO and others do, and include a unique identifier other than the title, and then only use that unique ID when processing the URL. So http://example.com/articles/1234567/is-the-pop-catholic
(note the missing 'e') and http://example.com/articles/1234567/is-the-pope-catholic
resolve to the same resource.