Why does [0-9]* match where there aren't any digits?

The * means zero-or-more matches, and it matches as soon as possible. If you run that command without the g flag (which means sed will stop after the first replacement), you will get as output habc 123. This is because it start reading from left to right, and because it couldn't match a, it will simply match the beginning of the line and then stop there.

Using the global (g) flag, it wil keep trying to match the rest of the string, and because * matches the empty string when it can't match anything else, it will place an h every time it cannot match more numbers.

Note that your second attempt is equivalent to sed "s/[0-9]\+/h/". Here + means one or more matches, meaning it won't match the empty string when it does not find a number to replace.


The answer is to do with the way that Regular Expressions are handled in sed. Regular Expressions or REs can get very complex and there's a tradeoff between the power of what you can do with them and the complexity of the syntax. Different programming languages have made different choices about how much power and therefore complexity they want to support. Sed is very powerful and therefore a bit more complex than you might expect. To get to the answer, the * matches a sequence of zero or more instances of the previous token. In your case the previous token is [0-9] which means any digit. Sed is noticing that there is a zero length string of digits before and after every character in the input string. This seems rather couterintuitive until you get used to it. You noticed one common way of fixing the problem which is to use /[0-9][0-9]*/ which is interpreted as a digit followed by zero or more digits. Another solution is to replace * with +. The + matches a sequence of one or more of the previous token. So the full command is:

echo "abc 123" | sed "s/[0-9]+/h/g"

You can read about the sed command using the manual online (just google man sed) or if the manuals are installed on your system just run the command "man sed"