How to safely get the file extension from a URL?
This is easiest with requests
and mimetypes
:
import requests
import mimetypes
response = requests.get(url)
content_type = response.headers['content-type']
extension = mimetypes.guess_extension(content_type)
The extension includes a dot prefix. For example, extension
is '.png'
for content type 'image/png'
.
The real proper way is to not use file extensions at all. Do a GET (or HEAD) request to the URL in question, and use the returned "Content-type" HTTP header to get the content type. File extensions are unreliable.
See MIME types (IANA media types) for more information and a list of useful MIME types.
Use urlparse
to parse the path out of the URL, then os.path.splitext
to get the extension.
import urlparse, os
url = 'http://www.plssomeotherurl.com/station.pls?id=111'
path = urlparse.urlparse(url).path
ext = os.path.splitext(path)[1]
Note that the extension may not be a reliable indicator of the type of the file. The HTTP Content-Type
header may be better.