How to safely get the file extension from a URL?

This is easiest with requests and mimetypes:

import requests
import mimetypes

response = requests.get(url)
content_type = response.headers['content-type']
extension = mimetypes.guess_extension(content_type)

The extension includes a dot prefix. For example, extension is '.png' for content type 'image/png'.


The real proper way is to not use file extensions at all. Do a GET (or HEAD) request to the URL in question, and use the returned "Content-type" HTTP header to get the content type. File extensions are unreliable.

See MIME types (IANA media types) for more information and a list of useful MIME types.


Use urlparse to parse the path out of the URL, then os.path.splitext to get the extension.

import urlparse, os

url = 'http://www.plssomeotherurl.com/station.pls?id=111'
path = urlparse.urlparse(url).path
ext = os.path.splitext(path)[1]

Note that the extension may not be a reliable indicator of the type of the file. The HTTP Content-Type header may be better.

Tags:

Python

File