Removing HTTP and WWW from URL python
A more elegant solution would be using urlparse:
from urllib.parse import urlparse
def get_hostname(url, uri_type='both'):
"""Get the host name from the url"""
parsed_uri = urlparse(url)
if uri_type == 'both':
return '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
elif uri_type == 'netloc_only':
return '{uri.netloc}'.format(uri=parsed_uri)
The first option includes https
or http
, depending on the link, and the second part netloc
includes what you were looking for.
You can use the string method replace
:
url = 'http://www.google.com/images'
url = url.replace("http://www.","")
or you can use regular expressions:
import re
url = re.compile(r"https?://(www\.)?")
url = url.sub('', 'http://www.google.com/images').strip().strip('/')