Remove repeating characters from words

You should do it without reduce or regexps:

>>> s = 'hhaaaaapppppyyy'
>>> ''.join(['' if i>1 and e==s[i-2] else e for i,e in enumerate(s)])
'haappyy'

The number of repetitions are hardcoded to >1 and -2 above. The general case:

Click to copy

>>> reps = 1
>>> ''.join(['' if i>reps-1 and e==s[i-reps] else e for i,e in enumerate(s)])
'hapy'

It can be done using regular expressions:

Click to copy

>>> import re
>>> re.sub(r'(.)\1+', r'\1\1', "haaaaapppppyyy")     
'haappyy'

(.)\1+ repleaces any character (.) followed by one or more of the same character (because of the backref \1 it must be the same) by twice the character.

You can squash multiple occurrences of letters with itertools.groupby:

Click to copy

>>> ''.join(c for c, _ in groupby("haaaaapppppyyy"))
'hapy'

Similarly, you can get haappyy from groupby with

Click to copy

>>> ''.join(''.join(s)[:2] for _, s in groupby("haaaaapppppyyy"))
'haappyy'

Remove repeating characters from words

Tags:

Python

Nlp

Nltk

Related

Recent Posts