Split string based on regex
Your question contains the string literal "\b[A-Z]{2,}\b"
,
but that \b
will mean backspace, because there is no r-modifier.
Try: r"\b[A-Z]{2,}\b"
.
I suggest
l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)
Check this demo.
You could use a lookahead:
re.split(r'[ ](?=[A-Z]+\b)', input)
This will split at every space that is followed by a string of upper-case letters which end in a word-boundary.
Note that the square brackets are only for readability and could as well be omitted.
If it is enough that the first letter of a word is upper case (so if you would want to split in front of Hello
as well) it gets even easier:
re.split(r'[ ](?=[A-Z])', input)
Now this splits at every space followed by any upper-case letter.