Verifying a Googlebot
function detectSearchBot($ip, $agent, &$hostname)
{
$hostname = $ip;
// check HTTP_USER_AGENT what not to touch gethostbyaddr in vain
if (preg_match('/(?:google|yandex)bot/iu', $agent)) {
// success - return host, fail - return ip or false
$hostname = gethostbyaddr($ip);
// https://support.google.com/webmasters/answer/80553
if ($hostname !== false && $hostname != $ip) {
// detect google and yandex search bots
if (preg_match('/\.((?:google(?:bot)?|yandex)\.(?:com|ru))$/iu', $hostname)) {
// success - return ip, fail - return hostname
$ip = gethostbyname($hostname);
if ($ip != $hostname) {
return true;
}
}
}
}
return false;
}
In my project, I use this function to identify Google and Yandex search bots.
The result of the detectSearchBot function is caching.
The algorithm is based on Google’s recommendation - https://support.google.com/webmasters/answer/80553
How to verify Googlebot.
In addition to Cristian's answer:
function is_valid_google_ip($ip) {
$hostname = gethostbyaddr($ip); //"crawl-66-249-66-1.googlebot.com"
return preg_match('/\.googlebot|google\.com$/i', $hostname);
}
function is_valid_google_request($ip=null,$agent=null){
if(is_null($ip)){
$ip=$_SERVER['REMOTE_ADDR'];
}
if(is_null($agent)){
$agent=$_SERVER['HTTP_USER_AGENT'];
}
$is_valid_request=false;
if (strpos($agent, 'Google')!==false && is_valid_google_ip($ip)){
$is_valid_request=true;
}
return $is_valid_request;
}
Note
Sometimes when using $_SERVER['HTTP_X_FORWARDED_FOR']
OR $_SERVER['REMOTE_ADDR']
more than 1 IP address is returned, for example '155.240.132.261, 196.250.25.120'. When this string is passed as an argument for gethostbyaddr()
PHP gives the following error:
Warning: Address is not a valid IPv4 or IPv6 address in...
To work around this I use the following code to extract the first IP address from the string and discard the rest. (If you wish to use the other IPs they will be in the other elements of the $ips array).
if (strstr($remoteIP, ', ')) {
$ips = explode(', ', $remoteIP);
$remoteIP = $ips[0];
}
https://www.php.net/manual/fr/function.gethostbyaddr.php