How to remove consecutive identical words from a string in python

Short regex magic:

import re

mystring = "my friend's new new new new and old old cats are running running in the street"
res = re.sub(r'\b(\w+\s*)\1{1,}', '\\1', mystring)
print(res)

regex pattern details:

\b - word boundary
(\w+\s*) - one or more word chars \w+ followed by any number of whitespace characters \s* - enclosed into a captured group (...)
\1{1,} - refers to the 1st captured group occurred one or more times {1,}

The output:

my friend's new and old cats are running in the street

Using itertools.groupby:

import itertools

>> ' '.join(k for k, _ in itertools.groupby(mystring.split()))
"my friend's new and old cats are running in the street"

mystring.split() splits the mystring.
itertools.groupby efficiently groups the consecutive words by k.
Using list comprehension, we just take the group key.
We join using a space.

The complexity is linear in the size of the input string.

How to remove consecutive identical words from a string in python

Tags:

Python

Related

Recent Posts