How to add rel="nofollow" to links with preg_replace()
Try this one (PHP 5.3+):
- skip selected address
- allow manually set rel parameter
and code:
function nofollow($html, $skip = null) {
return preg_replace_callback(
"#(<a[^>]+?)>#is", function ($mach) use ($skip) {
return (
!($skip && strpos($mach[1], $skip) !== false) &&
strpos($mach[1], 'rel=') === false
) ? $mach[1] . ' rel="nofollow">' : $mach[0];
},
$html
);
}
Examples:
echo nofollow('<a href="link somewhere" rel="something">something</a>');
// will be same because it's already contains rel parameter
echo nofollow('<a href="http://www.cnn.com">something</a>'); // ad
// add rel="nofollow" parameter to anchor
echo nofollow('<a href="http://localhost">something</a>', 'localhost');
// skip this link as internall link
Here is the DOMDocument solution...
$str = '<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com" rel="me">external</a>
<a href="http://google.com">external</a>
<a href="http://example.com" rel="nofollow">external</a>
<a href="http://stackoverflow.com" rel="junk in the rel">external</a>
';
$dom = new DOMDocument();
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($str);
$a = $dom->getElementsByTagName('a');
$host = strtok($_SERVER['HTTP_HOST'], ':');
foreach($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
if (preg_match('/^https?:\/\/' . preg_quote($host, '/') . '/', $href)) {
continue;
}
$noFollowRel = 'nofollow';
$oldRelAtt = $anchor->attributes->getNamedItem('rel');
if ($oldRelAtt == NULL) {
$newRel = $noFollowRel;
} else {
$oldRel = $oldRelAtt->nodeValue;
$oldRel = explode(' ', $oldRel);
if (in_array($noFollowRel, $oldRel)) {
continue;
}
$oldRel[] = $noFollowRel;
$newRel = implode($oldRel, ' ');
}
$newRelAtt = $dom->createAttribute('rel');
$noFollowNode = $dom->createTextNode($newRel);
$newRelAtt->appendChild($noFollowNode);
$anchor->appendChild($newRelAtt);
}
var_dump($dom->saveHTML());
Output
string(509) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator">internal cloaked link</a>
<a href="http://cnn.com" rel="me nofollow">external</a>
<a href="http://google.com" rel="nofollow">external</a>
<a href="http://example.com" rel="nofollow">external</a>
<a href="http://stackoverflow.com" rel="junk in the rel nofollow">external</a>
</body></html>
"
Try to make it more readable first, and only afterwards make your if
rules more complex:
function save_rseo_nofollow($content) {
$content["post_content"] =
preg_replace_callback('~<(a\s[^>]+)>~isU', "cb2", $content["post_content"]);
return $content;
}
function cb2($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/hostgator"; // re-add quirky config here
$blog_url = "http://localhost/";
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
Gives following output:
[post_content] =>
<a href="http://localhost/mytest/">internal</a>
<a href="http://localhost/mytest/go/hostgator" rel=nofollow>internal cloaked link</a>
<a href="http://cnn.com" rel=nofollow>external</a>
The problem in your original code might have been $rseo which wasn't declared anywhere.