collapsing whitespace in a string

Here's a single-step approach (but the uppercasing actually uses a string method -- much simpler!):

rex = re.compile(r'\W+')
result = rex.sub(' ', strarg).upper()

where strarg is the string argument (don't use names that shadow builtins or standard library modules, please).


s = "$$$aa1bb2 cc-dd ee_ff ggg."
re.sub(r'\W+', ' ', s).upper()
# ' AA1BB2 CC DD EE_FF GGG '

Is _ punctuation?

re.sub(r'[_\W]+', ' ', s).upper()
# ' AA1BB2 CC DD EE FF GGG '

Don't want the leading and trailing space?

re.sub(r'[_\W]+', ' ', s).strip().upper()
# 'AA1BB2 CC DD EE FF GGG'

result = rex.sub(' ', string) # this produces a string with tons of whitespace padding
result = rex.sub('', result) # this reduces all those spaces

Because you typo'd and forgot to use rex_s for the second call instead. Also, you need to substitute at least one space back in or you'll end up with any multiple-space gap becoming no gap at all, instead of a single-space gap.

result = rex.sub(' ', string) # this produces a string with tons of whitespace padding
result = rex_s.sub(' ', result) # this reduces all those spaces

Tags:

Python

Regex