Grep: The asterisk (*) doesn't always work
An asterisk in regular expressions means "match the preceding element 0 or more times".
In your particular case with grep 'This*String' file.txt
, you are trying to say, "hey, grep, match me the word Thi
, followed by lowercase s
zero or more times, followed by the word String
". The lowercase s
is nowhere to be found in Example
, hence grep ignores ThisExampleString
.
In the case of grep '*String' file.txt
, you are saying "grep, match me the empty string--literally nothing--preceding the word String
". Of course, that's not how ThisExampleString
is supposed to be read. (There are other possible meanings--you can try this with and without the -E
flag--but none of the meanings are anything like what you really want here.)
Knowing that .
means "any single character", we could do this: grep 'This.*String' file.txt
. Now the grep command will read it correctly: This
followed by any character (think of it as selection of ASCII characters) repeated any number of times, followed by String
.
The *
metacharacter in BRE1s, ERE1s and PCRE1s matches 0 or more occurences of the previously grouped pattern (if a grouped pattern is preceding the *
metacharacter), 0 or more occurences of the previous character class (if a character class is preceding the *
metacharacter) or 0 or more occurences of the previous character (if neither a grouped pattern nor a character class is preceding the *
metacharacter);
This means that in the This*String
pattern, being the *
metacharacter not preceded either by a grouped pattern or a character class, the *
metacharacter matches 0 or more occurences of the previous character (in this case the s
character):
% cat infile
ThisExampleString
ThisString
ThissString
% grep 'This*String' infile
ThisString
ThissString
To match 0 or more occurences of any character, you want to match 0 or more occurences of the .
metacharacter, which matches any character:
% cat infile
ThisExampleString
% grep 'This.*String' infile
ThisExampleString
The *
metacharacter in BREs and EREs is always "greedy", i.e. it will match the longest match:
% cat infile
ThisExampleStringIsAString
% grep -o 'This.*String' infile
ThisExampleStringIsAString
This may not be the desired behavior; in case it's not, you can turn on grep
's PCRE engine (using the -P
option) and append the ?
metacharacter, which when put after the *
and +
metacharacters has the effect of changing their greediness:
% cat infile
ThisExampleStringIsAString
% grep -Po 'This.*?String' infile
ThisExampleString
1: Basic Regular Expressions, Extended Regular Expressions and Perl Compatible Regular Expressions
One of explanation found here link:
Asterisk "
*
" does not mean the same thing in regular expressions as in wildcarding; it is a modifier that applies to the preceding single character, or expression such as [0-9]. An asterisk matches zero or more of what precedes it. Thus[A-Z]*
matches any number of upper-case letters, including none, while[A-Z][A-Z]*
matches one or more upper-case letters.