How to extract a filename from a URL & append a word to it?
You can use urllib.parse.urlparse
with os.path.basename
:
import os
from urllib.parse import urlparse
url = "http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg"
a = urlparse(url)
print(a.path) # Output: /kyle/09-09-201315-47-571378756077.jpg
print(os.path.basename(a.path)) # Output: 09-09-201315-47-571378756077.jpg
os.path.basename(url)
Why try harder?
In [1]: os.path.basename("https://example.com/file.html")
Out[1]: 'file.html'
In [2]: os.path.basename("https://example.com/file")
Out[2]: 'file'
In [3]: os.path.basename("https://example.com/")
Out[3]: ''
In [4]: os.path.basename("https://example.com")
Out[4]: 'example.com'
Note 2020-12-20
Nobody has thus far provided a complete solution.
A URL can contain a ?[query-string]
and/or a #[fragment Identifier]
(but only in that order: ref)
In [1]: from os import path
In [2]: def get_filename(url):
...: fragment_removed = url.split("#")[0] # keep to left of first #
...: query_string_removed = fragment_removed.split("?")[0]
...: scheme_removed = query_string_removed.split("://")[-1].split(":")[-1]
...: if scheme_removed.find("/") == -1:
...: return ""
...: return path.basename(scheme_removed)
...:
In [3]: get_filename("a.com/b")
Out[3]: 'b'
In [4]: get_filename("a.com/")
Out[4]: ''
In [5]: get_filename("https://a.com/")
Out[5]: ''
In [6]: get_filename("https://a.com/b")
Out[6]: 'b'
In [7]: get_filename("https://a.com/b?c=d#e")
Out[7]: 'b'
filename = url[url.rfind("/")+1:]
filename_small = filename.replace(".", "_small.")
maybe use ".jpg" in the last case since a . can also be in the filename.