How do you implement a good profanity filter?

Whilst I know that this question is fairly old, but it's a commonly occurring question...

There is both a reason and a distinct need for profanity filters (see Wikipedia entry here), but they often fall short of being 100% accurate for very distinct reasons; Context and accuracy.

It depends (wholly) on what you're trying to achieve - at it's most basic, you're probably trying to cover the "seven dirty words" and then some... Some businesses need to filter the most basic of profanity: basic swear words, URLs or even personal information and so on, but others need to prevent illicit account naming (Xbox live is an example) or far more...

User generated content doesn't just contain potential swear words, it can also contain offensive references to:

Sexual acts
Sexual orientation
Religion
Ethnicity
Etc...

And potentially, in multiple languages. Shutterstock has developed basic dirty-words lists in 10 languages to date, but it's still basic and very much oriented towards their 'tagging' needs. There are a number of other lists available on the web.

I agree with the accepted answer that it's not a defined science and as language is a continually evolving challenge but one where a 90% catch rate is better than 0%. It depends purely on your goals - what you're trying to achieve, the level of support you have and how important it is to remove profanities of different types.

In building a filter, you need to consider the following elements and how they relate to your project:

Words/phrases
Acronyms (FOAD/LMFAO etc)
False positives (words, places and names like 'mishit', 'scunthorpe' and 'titsworth')
URLs (porn sites are an obvious target)
Personal information (email, address, phone etc - if applicable)
Language choice (usually English by default)
Moderation (how, if at all, you can interact with user generated content and what you can do with it)

You can easily build a profanity filter that captures 90%+ of profanities, but you'll never hit 100%. It's just not possible. The closer you want to get to 100%, the harder it becomes... Having built a complex profanity engine in the past that dealt with more than 500K realtime messages per day, I'd offer the following advice:

A basic filter would involve:

Building a list of applicable profanities
Developing a method of dealing with derivations of profanities

A moderately complex filer would involve, (In addition to a basic filter):

Using complex pattern matching to deal with extended derivations (using advanced regex)
Dealing with Leetspeak (l33t)
Dealing with false positives

A complex filter would involve a number of the following (In addition to a moderate filter):

Whitelists and blacklists
Naive bayesian inference filtering of phrases/terms
Soundex functions (where a word sounds like another)
Levenshtein distance
Stemming
Human moderators to help guide a filtering engine to learn by example or where matches aren't accurate enough without guidance (a self/continually-improving system)
Perhaps some form of AI engine

I don't know of any good libraries for this, but whatever you do, make sure that you err in the direction of letting stuff through. I've dealt with systems that wouldn't allow me to use "mpassell" as a username, because it contains "ass" as a substring. That's a great way to alienate users!

Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?

Also, one can't forget The Untold History of Toontown's SpeedChat, where even using a "safe-word whitelist" resulted in a 14-year-old quickly circumventing it with: "I want to stick my long-necked Giraffe up your fluffy white bunny."

Bottom line: Ultimately, for any system that you implement, there is absolutely no substitute for human review (whether peer or otherwise). Feel free to implement a rudimentary tool to get rid of the drive-by's, but for the determined troll, you absolutely must have a non-algorithm-based approach.

A system that removes anonymity and introduces accountability (something that Stack Overflow does well) is helpful also, particularly in order to help combat John Gabriel's G.I.F.T.

You also asked where you can get profanity lists to get you started -- one open-source project to check out is Dansguardian -- check out the source code for their default profanity lists. There is also an additional third party Phrase List that you can download for the proxy that may be a helpful gleaning point for you.

Edit in response to the question edit: Thanks for the clarification on what you're trying to do. In that case, if you're just trying to do a simple word filter, there are two ways you can do it. One is to create a single long regexp with all of the banned phrases that you want to censor, and merely do a regex find/replace with it. A regex like:

$filterRegex = "(boogers|snot|poop|shucks|argh)"

and run it on your input string using preg_match() to wholesale test for a hit,

or preg_replace() to blank them out.

You can also load those functions up with arrays rather than a single long regex, and for long word lists, it may be more manageable. See the preg_replace() for some good examples as to how arrays can be used flexibly.

For additional PHP programming examples, see this page for a somewhat advanced generic class for word filtering that *'s out the center letters from censored words, and this previous Stack Overflow question that also has a PHP example (the main valuable part in there is the SQL-based filtered word approach -- the leet-speak compensator can be dispensed with if you find it unnecessary).

You also added: "Getting the list of words in the first place is the real question." -- in addition to some of the previous Dansgaurdian links, you may find this handy .zip of 458 words to be helpful.

How do you implement a good profanity filter?

Tags:

Php

Regex

User Input

Related

Recent Posts