Grep Match and extract
With grep -o
, you will have to match exactly what you want to extract. Since you don't want to extract the proto=
string, you should not match it.
An extended regular expression that would match either tcp
or udp
followed by a slash and some non-empty alphanumeric string is
(tcp|udp)/[[:alnum:]]+
Applying this on your data:
$ grep -E -o '(tcp|udp)/[[:alnum:]]+' file
tcp/http
tcp/https
udp/dns
To make sure that we only do this on lines that start with the string proto=
:
grep '^proto=' file | grep -E -o '(tcp|udp)/[[:alnum:]]+'
With sed
, removing everything before the first =
and after the first blank character:
$ sed 's/^[^=]*=//; s/[[:blank:]].*//' file
tcp/http
tcp/https
udp/dns
To make sure that we only do this on lines that start with the string proto=
, you could insert the same pre-processing step with grep
as above, or you could use
sed -n '/^proto=/{ s/^[^=]*=//; s/[[:blank:]].*//; p; }' file
Here, we suppress the default output with the -n
option, and then we trigger the substitutions and an explicit print of the line only if the line matches ^proto=
.
With awk
, using the default field separator, and then splitting the first field on =
and printing the second bit of it:
$ awk '{ split($1, a, "="); print a[2] }' file
tcp/http
tcp/https
udp/dns
To make sure that we only do this on lines that start with the string proto=
, you could insert the same pre-processing step with grep
as above, or you could use
awk '/^proto=/ { split($1, a, "="); print a[2] }' file
If you are on GNU grep (for the -P
option), you could use:
$ grep -oP 'proto=\K[^ ]*' file
tcp/http
tcp/https
udp/dns
Here we match the proto=
string, to make sure that we are extracting the correct column, but then we discard it from the output with the \K
flag.
The above assumes that the columns are space-separated. If tabs are also a valid separator, you would use \S
to match the non-whitespace characters, so the command would be:
grep -oP 'proto=\K\S*' file
If you also want to protect against match fields where proto=
is a substring, such as a thisisnotaproto=tcp/https
, you can add word boundary with \b
like so:
grep -oP '\bproto=\K\S*' file
Using awk
:
awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input
$1 ~ "proto"
will ensure we only take action on lines with proto
in the first column
sub(/proto=/, "")
will remove proto=
from the input
print $1
prints the remaining column
$ awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input
tcp/http
tcp/https
udp/dns