Filter Comment Spam? PHP
When writing your own method, you'll have to employ a combination of heuristics.
For example, it's very common for spam comments to have 2 or more URL links.
I'd begin writing your filter like so, using a dictionary of trigger words and have it loop through and use those to determine probability:
function spamProbability($text){
$probability = 0;
$text = strtolower($text); // lowercase it to speed up the loop
$myDict = array("http","penis","pills","sale","cheapest");
foreach($myDict as $word){
$count = substr_count($text, $word);
$probability += .2 * $count;
}
return $probability;
}
Note that this method will result in many false positives, depending on your word set; you could have your site "flag" for moderation (but goes live immediately) those with probability > .3 and < .6, have it require those >.6 and <.9 enter a queue for moderation (where they don't appear until approved), and then anything over >1 is simply rejected.
Obviously these are all values you'll have to tweak the thresholds but this should start you off with a pretty basic system. You can add to it several other qualifiers for increasing / decreasing probability of spam, such as checking the ratio of bad words to words, changing weights of words, etc.
I'm surprised no one mentioned Akismet. I've never had a message marked wrong (be it spam or legit). My WordPress install came with it. All I had to do was hit enable.