Create (sane/safe) filename from any (unsafe) string
Python:
"".join([c for c in filename if c.isalpha() or c.isdigit() or c==' ']).rstrip()
this accepts Unicode characters but removes line breaks, etc.
example:
filename = u"ad\nbla'{-+\)(ç?"
gives: adblaç
edit str.isalnum() does alphanumeric on one step. – comment from queueoverflow below. danodonovan hinted on keeping a dot included.
keepcharacters = (' ','.','_')
"".join(c for c in filename if c.isalnum() or c in keepcharacters).rstrip()
More or less what has been mentioned here with regexp, but in reverse (replace any NOT listed):
>>> import re
>>> filename = u"ad\nbla'{-+\)(ç1?"
>>> re.sub(r'[^\w\d-]','_',filename)
u'ad_bla__-_____1_'
My requirements were conservative ( the generated filenames needed to be valid on multiple operating systems, including some ancient mobile OSs ). I ended up with:
"".join([c for c in text if re.match(r'\w', c)])
That white lists the alphanumeric characters ( a-z, A-Z, 0-9 ) and the underscore. The regular expression can be compiled and cached for efficiency, if there are a lot of strings to be matched. For my case, it wouldn't have made any significant difference.