get Thumbnail image from wikimedia commons
If you're okay to rely on the fact the current way of building the URL won't change in the future (which is not guaranteed), then you can do it.
The URL looks like this:
https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/Tour_Eiffel_Wikimedia_Commons.jpg/200px-Tour_Eiffel_Wikimedia_Commons.jpg
- The first part is always the same:
https://upload.wikimedia.org/wikipedia/commons/thumb
- The second part is the first character of the MD5 hash of the file name. In this case, the MD5 hash of
Tour_Eiffel_Wikimedia_Commons.jpg
isa85d416ee427dfaee44b9248229a9cdd
, so we get/a
. - The third part is the first two characters of the MD5 hash from above:
/a8
. - The fourth part is the file name:
/Tour_Eiffel_Wikimedia_Commons.jpg
- The last part is the desired thumbnail width, and the file name again:
/200px-Tour_Eiffel_Wikimedia_Commons.jpg
In case anyone is doing this query in SPARQL instead of Python: There exists an MD5 function in SPARQL and the whole string manipulation can be implemented in SPARQL too!
BIND(REPLACE(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/", "") as ?fileName) .
BIND(REPLACE(?fileName, " ", "_") as ?safeFileName)
BIND(MD5(?safeFileName) as ?fileNameMD5) .
BIND(CONCAT("https://upload.wikimedia.org/wikipedia/commons/thumb/", SUBSTR(?fileNameMD5, 1, 1), "/", SUBSTR(?fileNameMD5, 1, 2), "/", ?safeFileName, "/650px-", ?safeFileName) as ?thumb)
Run this live query in Wikidata's query service: here, as discussed here: https://discourse-mediawiki.wmflabs.org/t/accessing-a-commons-thumbnail-via-wikidata/499
Solution in Python based on the solution of @svick:
import hashlib
def get_wc_thumb(image, width=300): # image = e.g. from Wikidata, width in pixels
image = image.replace(' ', '_') # need to replace spaces with underline
m = hashlib.md5()
m.update(image.encode('utf-8'))
d = m.hexdigest()
return "https://upload.wikimedia.org/wikipedia/commons/thumb/"+d[0]+'/'+d[0:2]+'/'+image+'/'+str(width)+'px-'+image