What is a word boundary in regex, does \b match hyphen '-'?
A word boundary, in most regex dialects, is a position between \w
and \W
(non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_]
).
So, in the string "-12"
, it would match before the 1 or after the 2. The dash is not a word character.
A word boundary can occur in one of three positions:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
Word characters are alpha-numeric; a minus sign is not. Taken from Regex Tutorial.
In the course of learning regular expression, I was really stuck in the metacharacter which is \b
. I indeed didn't comprehend its meaning while I was asking myself "what it is, what it is" repetitively. After some attempts by using the website, I watch out the pink vertical dashes at the every beginning of words and at the end of words. I got it its meaning well at that time. It's now exactly word(\w
)-boundary.
My view is merely to immensely understanding-oriented. Logic behind of it should be examined from another answers.