How to get the base domain name from an URL using PHP?
top-level domains and second-level domains may be 2 characters long but a registered subdomain must be at least 3 characters long.
EDIT: because of pjv's comment, i learned Australian domain names are an exception because they allow 5 TLDs as SLDs (com,net,org,asn,id) example: somedomain.com.au. i'm guessing com.au is nationally controlled domain name which "shares". so, technically, "com.au" would still be the "base domain", but that's not useful.
EDIT: there are 47,952 possible three-letter domain names (pattern: [a-zA-Z0-9][a-zA-Z0-9-][a-zA-Z0-9] or 36 * 37 * 36) combined with just 8 of the most common TLDS (com,org,etc) we have 383,616 possibilities -- without even adding in the entire scope of TLDs. 1-letter and 2-letter domain names still exist, but are not valid going forward.
in google.com -- "google" is a subdomain of "com"
in google.co.uk -- "google" is a subdomain of "co", which in turn is a subdomain of "uk", or a second-level domain really, since "co" is also a valid top-level domain
in www.google.com -- "www" is a subdomain of "google" which is a subdomain of "com"
"co.uk" is NOT a valid host because there is no valid domain name
going with that assumption this function will return the proper "basedomain" in almost all cases, without requiring a "url map".
if you happen to be one of the rare cases, perhaps you can modify this to fulfill particular needs...
EDIT: you must pass the domain string as a URL with it's protocol (http://, ftp://, etc) or parse_url()
will not consider it a valid URL (unless you want to modify the code to behave differently)
function basedomain( $str = '' )
{
// $str must be passed WITH protocol. ex: http://domain.com
$url = @parse_url( $str );
if ( empty( $url['host'] ) ) return;
$parts = explode( '.', $url['host'] );
$slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}
if you need to be accurate use fopen
or curl
to open this URL:
http://data.iana.org/TLD/tlds-alpha-by-domain.txt
then read the lines into an array and use that to compare the domain parts
EDIT: to allow for Australian domains:
function au_basedomain( $str = '' )
{
// $str must be passed WITH protocol. ex: http://domain.com
$url = @parse_url( $str );
if ( empty( $url['host'] ) ) return;
$parts = explode( '.', $url['host'] );
$slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
if ( preg_match( '/\.(com|net|asn|org|id)\.au$/i', $url['host'] ) ) $slice = 3;
return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}
IMPORTANT ADDITIONAL NOTES: I don't use this function to validate domains. It is generic code I only use to extract the base domain for the server it is running on from the global $_SERVER['SERVER_NAME']
for use within various internal scripts. Considering I have only ever worked on sites within the US, I have never encountered the Australian variants that pjv asked about. It is handy for internal use, but it is a long way from a complete domain validation process. If you are trying to use it in such a way, I recommend not to because of too many possibilities to match invalid domains.
You could do this:
$urlData = parse_url($url);
$host = $urlData['host'];
** Update **
The best way I can think of is to have a mapping of all the TLDs that you want to handle, since certain TLDs can be tricky (co.uk).
// you can add more to it if you want
$urlMap = array('com', 'co.uk');
$host = "";
$url = "http://www.google.co.uk";
$urlData = parse_url($url);
$hostData = explode('.', $urlData['host']);
$hostData = array_reverse($hostData);
if(array_search($hostData[1] . '.' . $hostData[0], $urlMap) !== FALSE) {
$host = $hostData[2] . '.' . $hostData[1] . '.' . $hostData[0];
} elseif(array_search($hostData[0], $urlMap) !== FALSE) {
$host = $hostData[1] . '.' . $hostData[0];
}
echo $host;
Try using: http://php.net/manual/en/function.parse-url.php. Something like this should work:
$urlParts = parse_url($yourUrl);
$hostParts = explode('.', $urlParts['host']);
$hostParts = array_reverse($hostParts);
$host = $hostParts[1] . '.' . $hostParts[0];