Regular Expression for finding double characters in Bash
This really is two questions, and should have been split up. But since the answers are relatively simple, I will put them here. These answers are for GNU grep
specifically.
a) egrep
is the same as grep -E
. Both indicate that "Extended Regular Expressions" should be used instead of grep
's default Regular Expressions. grep
requires the backslashes for plain Regular Expressions.
From the man
page:
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
See the man
page for additional details about historical conventions and portability.
b) Use egrep '(.)\1{N}'
and replace N
with the number of characters you wish to replace minus one (since the dot matches the first one). So if you want to match a character repeated four times, use egrep '(.)\1{3}'
.
This would look for 2 or more occurences of the same character:
grep -E '(.)\1+' file
If your awk has the -o option this would print it each match on a new line..
grep -Eo '(.)\1+' file
To find matches with exactly 3 matches:
grep -E '(.)\1{2}' file
Or 3 or more:
grep -E '(.)\1{2,}' file
etc..
edit
Actually @stephane_chazelas is right about back references and -E. I had forgotten about that. I tried it in BSD grep and GNU grep and it works there but it is not in some other greps. You would need to use one of the below version..
Regular grep versions:
grep '\(.\)\1\{1,\}' file
grep -o '\(.\)\1\{1,\}' file
grep '\(.\)\1\{2\}' file
grep '\(.\)\1\{2,\}' file
The -o
option is also not standard grep BTW (probably if your grep understands -o it can also do the back reference)..
Note:
grep -E '(.)\1{2,}'
file and grep '\(.\)\1\{2\}'
file are wrong as alexis indicated and should be ignored..
First, thank you all for your supporting comments and suggestions. As it turns out I was already quite close to the answer.
The Main Issue was about:
Is there a simple way to look for n occurences of the same character, e.g.
aa
,tttttt
Short answer:
The following [variations of] commands will repeat a
at least one and infinite times
grep 'a\{1,}
grep -E \(a\)\{1,\}
egrep a{1,}
or, with GNU Regular Expressions available
grep a\+
The number of repeatings are set inside the curly brackets, through the pattern {min,max}
→ {n}
repeat exactly n
times, {n,}
repeat at least n
times and {n,m}
repeat at least n
but at most m
times.
Thus, as a consequence, raised the secondary issue:
Is the necessity of setting backlashes bound to the command I use?
Short answer: Yes, the use of backslashes depends on whether one uses grep
or egrep
grep
: backslash activates metacharacters [uses Basic Regular Expressions]egrep
backslash de-activates metacharacters [uses Extended Regular Expressions]
As this is the short answer, I want to provide those who ran into comparable issues, I added my basic summary of what out one seemingly has to be aware of, working with grep
and egrep
.
Basic, Extended, and GNU Regular Expressions
Basic Regular Expressions
Used in grep
, ed
and sed
command
Basic Regular Expressions set features are:
- Most Metacharacters, e.g.
? [ . \ )
etc. are activated through a backslash. If there is no backslash they will be taken as (part of the) search term. ^ $ \<
and\>
are supported without a backslash- No shorthand characters [
\b
,\s
, etc.]
GNU Basic Regular Expressions add to these
\?
repeat character zero or one time (c\?
matchesc
andcc
) and is an alternative for\{0,1\}
\+
repeat a character at least one time (c\+
matchescc
,cccccccc
etc.) and is an alternative for\{1,\}
\|
is supported (e.g.grep a\|b
will look fora
orb
grep -E
enables the command to use the whole set of the Extended Regular Expressions:
Extended Regular Expressions [ERE]
Used in egrep
, awk
and emacs
is the Basic Set plus quite some features.
- Metacharacters are deactivated through a backslash
- No back references
- else: a lot of the the magic Regular Expressions usually can do for one
GNU Extendend Regular Expressions
adds the following features
- shorthand classes
- quantifiers
The two links will direct one to regular-expressions.info which, in addition to the awsome support I've got here, really helped me a lot.