Dividing a string at various punctuation marks using split()

A modified version of larsks' answer, where you don't need to type all punctuation characters yourself:

import re, string

re.split("[" + string.punctuation + "]+", test)
['hello', 'how are you', 'I am fine', 'thank you', ' And you', '']

Since you don't want to use the re module, you can use this:

 test.replace(',',' ').replace('.',' ').replace('?',' ').split()

If you want to split a string based on multiple delimiters, as in your example, you're going to need to use the re module despite your bizarre objections, like this:

>>> re.split('[?.,]', test)
['hello', 'how are you', 'I am fine', 'thank you', ' And you', '']

It's possible to get a similar result using split, but you need to call split once for every character, and you need to iterate over the results of the previous split. This works but it's u-g-l-y:

>>> sum([z.split() 
... for z in sum([y.split('?') 
... for y in sum([x.split('.') 
... for x in test.split(',')],[])], [])], [])
['hello', 'how', 'are', 'you', 'I', 'am', 'fine', 'thank', 'you', 'And', 'you']

This uses sum() to flatten the list returned by the previous iteration.


This is the best way I can think of without using the re module:

"".join((char if char.isalpha() else " ") for char in test).split()