How to remove a word prefix using grep?
As the others have noted, grep
is not well suited for this task, sed
is a good option, or if the text is well ordered a simple cut
might be easier to type:
echo www.abc.com | cut -d. -f2-
-d.
tellscut
to use.
as a delimiter.-f2-
tellscut
to return field 2 to infinity.
with grep's --only-matching
and \K
You can do this with a grep's --only-matching
flag:
echo "www.abc.com" | grep --perl-regexp --only-matching 'www.\K.*'
which can be shortened to
echo "www.abc.com" | grep -Po 'www.\K.*'
Both commands produce
abc.com
with grep (GNU grep) 3.3.
Instead of echo
, I'll use a here string to shorten the command further:
grep -Po 'www.\K.*' <<< "www.abc.com"
\K
resets the starting point of the match, essentially forgetting the matched "www.". See this for more on \K
.
with grep's positive lookbehind
You can also do this with a positive lookbehind:
grep -Po '(?<=www.).*' <<< "www.abc.com"
with awk's field separator -F
awk -F 'www.' <<< "www.abc.com" '$2{print $2}'
This prints
abc.com
The $2{print $2}
part will print the second field if it's defined. This is necessary in case of multi-line input to avoid outputting blank lines for input lines that don't contain the field separator.
You don't edit strings with grep
in Unix shell, grep
is usually used to find or remove some lines from the text. You'd rather use sed
instead:
$ echo www.example.com | sed 's/^[^\.]\+\.//'
example.com
You'll need to learn regular expressions to use it effectively.
Sed can also edit file in-place (modify the file), if you pass -i
argument, but be careful, you can easily lose data if you write the wrong sed
command and use -i
flag.
An example
From your comments guess you have a TeX document, and your want to remove the first part of all .com domain names. If it is your document test.tex
:
\documentclass{article}
\begin{document}
www.example.com
example.com www.another.domain.com
\end{document}
then you can transform it with this sed
command (redirect output to file or edit in-place with -i
):
$ sed 's/\([a-z0-9-]\+\.\)\(\([a-z0-9-]\+\.\)\+com\)/\2/gi' test.tex
\documentclass{article}
\begin{document}
example.com
example.com another.domain.com
\end{document}
Please note that:
- A common sequence of allowed symbols followed by a dot is matched by
[a-z0-9-]\+\.
- I used groups in the regular expression (parts of it within
\(
and\)
) to indicate the first and the second part of the URL, and I replace the entire match with its second group (\2
in the substitution pattern) - The domain should be at least 3rd level .com domain (every
\+
repition means at least one match) - The search is case insensitive (
i
flag in the end) - It can do more than match per line (
g
flag in the end)
You can do this using grep
easily:
$ echo www.google.com | grep -o '[^.]*\.com'
google.com
Instead of echo
you must give your file.
$ grep -o '[^.]*\.com$' < file
I used here the regular expression '[^.]*.com'. That means: find me a word without .
in it ([^.]*
), after which goes .com
(\.com
in re). The -o
key says that grep
must show only that part that was found.