How to output only captured groups with sed?

The key to getting this to work is to tell sed to exclude what you don't want to be output as well as specifying what you do want.

Click to copy

string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

This says:

don't default to printing each line (-n)
exclude zero or more non-digits
include one or more digits
exclude one or more non-digits
include one or more digits
exclude zero or more non-digits
print the substitution (p)

In general, in sed you capture groups using parentheses and output what you capture using a back reference:

Click to copy

echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/'

will output "bar". If you use -r (-E for OS X) for extended regex, you don't need to escape the parentheses:

Click to copy

echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'

There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated:

Click to copy

echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'

outputs "a bar a".

If you have GNU grep (it may also work in BSD, including OS X):

Click to copy

echo "$string" | grep -Po '\d+'

or variations such as:

Click to copy

echo "$string" | grep -Po '(?<=\D )(\d+)'

The -P option enables Perl Compatible Regular Expressions. See man 3 pcrepattern or man 3 pcresyntax.

Sed has up to nine remembered patterns but you need to use escaped parentheses to remember portions of the regular expression.

See here for examples and more detail

you can use grep

Click to copy

grep -Eow "[0-9]+" file

How to output only captured groups with sed?

Tags:

Regex

Sed

Related

Recent Posts