Extract domain name from URL in Python
Simple solution via regex
import re
def domain_name(url):
return url.split("www.")[-1].split("//")[-1].split(".")[0]
It seems you can use urlparse https://docs.python.org/3/library/urllib.parse.html for that url, and then extract the netloc.
And from the netloc you could easily extract the domain name by using split
Use tldextract
which is more efficient version of urlparse
, tldextract
accurately separates the gTLD
or ccTLD
(generic or country code top-level domain) from the registered domain
and subdomains
of a URL.
>>> import tldextract
>>> ext = tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
>>> ext.domain
'cnn'