In Python, how do I split a string and keep the separators?
>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
another example, split on non alpha-numeric and keep the separators
import re
a = "foo,bar@candy*ice%cream"
re.split('([^a-zA-Z0-9])',a)
output:
['foo', ',', 'bar', '@', 'candy', '*', 'ice', '%', 'cream']
explanation
re.split('([^a-zA-Z0-9])',a)
() <- keep the separators
[] <- match everything in between
^a-zA-Z0-9 <-except alphabets, upper/lower and numbers.
If you are splitting on newline, use splitlines(True)
.
>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']
(Not a general solution, but adding this here in case someone comes here not realizing this method existed.)