python regular expression to remove repeated words
Non- regex solution using itertools.groupby
:
>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice"
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'
text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row
The \b
matches the empty string, but only at the beginning or end of a word.