Dividing a string at various punctuation marks using split()
A modified version of larsks' answer, where you don't need to type all punctuation characters yourself:
import re, string
re.split("[" + string.punctuation + "]+", test)
['hello', 'how are you', 'I am fine', 'thank you', ' And you', '']
Since you don't want to use the re module, you can use this:
test.replace(',',' ').replace('.',' ').replace('?',' ').split()
If you want to split a string based on multiple delimiters, as in your example, you're going to need to use the re
module despite your bizarre objections, like this:
>>> re.split('[?.,]', test)
['hello', 'how are you', 'I am fine', 'thank you', ' And you', '']
It's possible to get a similar result using split
, but you need to call split once for every character, and you need to iterate over the results of the previous split. This works but it's u-g-l-y:
>>> sum([z.split()
... for z in sum([y.split('?')
... for y in sum([x.split('.')
... for x in test.split(',')],[])], [])], [])
['hello', 'how', 'are', 'you', 'I', 'am', 'fine', 'thank', 'you', 'And', 'you']
This uses sum()
to flatten the list returned by the previous iteration.
This is the best way I can think of without using the re module:
"".join((char if char.isalpha() else " ") for char in test).split()