strip punctuation with regex - python

I think this function will be helpful and concise in removing punctuation:

import re
def remove_punct(text):
    new_words = []
    for word in text:
        w = re.sub(r'[^\w\s]','',word) #remove everything except words and space
        w = re.sub(r'_','',w) #how to remove underscore as well
        new_words.append(w)
    return new_words

You don't need regular expression to do this task. Use str.strip with string.punctuation:

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> '!Hello.'.strip(string.punctuation)
'Hello'

>>> ' '.join(word.strip(string.punctuation) for word in "Hello, world. I'm a boy, you're a girl.".split())
"Hello world I'm a boy you're a girl"

Tags:

Python

Regex