strip punctuation with regex - python
I think this function will be helpful and concise in removing punctuation:
import re
def remove_punct(text):
new_words = []
for word in text:
w = re.sub(r'[^\w\s]','',word) #remove everything except words and space
w = re.sub(r'_','',w) #how to remove underscore as well
new_words.append(w)
return new_words
You don't need regular expression to do this task. Use str.strip
with string.punctuation
:
>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> '!Hello.'.strip(string.punctuation)
'Hello'
>>> ' '.join(word.strip(string.punctuation) for word in "Hello, world. I'm a boy, you're a girl.".split())
"Hello world I'm a boy you're a girl"