Regex to match all words except a given list

Based on Tomalaks answer:

(?<!and|or|not)\b(?!and|or|not)

This regex has two problems:

  1. (?<! ) only works for fixed length look-behind

  2. The previous regex only looked at end ending/beginning of the surrounding words, not the whole word.

(?<!\band)(?<!\bor)(?<!\bnot)\b(?!(?:and|or|not)\b)

This regex fixes both the above problems. First by splitting the look-behind into three separate ones. Second by adding word-boundaries (\b) inside the look-arounds.


John,

The regex in your question is almost correct. The only problem is that you put the lookahead at the end of the regex instead of at the start. Also, you need to add word boundaries to force the regex to match whole words. Otherwise, it will match "nd" in "and", "r" in "or", etc, because "nd" and "r" are not in your negative lookahead.

(?i)\b(?!and|not|or)(?[a-z0-9]+)\b


This is a little dirty, but it works:

(?<!\b(?:and| or|not))\b(?!(?:and|or|not)\b)

In plain English, this matches any word boundary not preceded by and not followed by "and", "or", or "not". It matches whole words only, e.g. the position after the word "sand" would not be a match just because it is preceded by "and".

The space in front of the "or" in the zero-width look-behind assertion is necessary to make it a fixed length look-behind. Try if that already solves your problem.

EDIT: Applied to the string "except the words AND, OR and NOT." as a global replace with single quotes, this returns:

'except' 'the' 'words' AND, OR and NOT.

Call me crazy, but I'm not a fan of fighting regex; I limit my patterns to simple things I can understand, and often cheat for the rest - for example via a MatchEvaluator:

    string[] whitelist = new string[] { "and", "not", "or" };
    string input = "foo and bar or blop";
    string result = Regex.Replace(input, @"([a-z0-9]+)",
        delegate(Match match) {
            string word = match.Groups[1].Value;
            return Array.IndexOf(whitelist, word) >= 0
                ? word : ("\"" + word + "\"");
        });

(edited for more terse layout)

Tags:

C#

.Net

Regex