extract value between two search patterns on same line
If you have a version of GNU grep with PCRE (-P
) support, then assuming you mean the first occurrence of ,ou
grep -oP '(?<=dn: uid=).+?(?=,ou=)' file
If you want to match up to the second ,ou
you can remove the non-greedy ?
modifier
grep -oP '(?<=dn: uid=).+(?=,ou=)' file
The expressions in parentheses are zero-length assertions (aka lookarounds) meaning that they form part of the match, but are not returned as part of the result. You could do the same thing natively in perl e.g.
perl -ne 'print "$1\n" if /(?<=dn: uid=)(.+?)(?=,ou=)/' file
It's possible to do something similar in sed, using regular (non zero-length) grouping e.g. (for GNU sed - other varieties may need additional escaping)
sed -rn 's/(.*dn: uid=)([^,]+)(,ou=.*)/\2/p' file
or simplifying slightly
sed -rn 's/.*dn: uid=([^,]+),ou=.*/\1/p' file
Note the [^,]
is a bit of a hack here, since sed doesn't have a true non-greedy match option.
Afterthought: although it's not exactly what you asked, it looks like what you actually want to do is read comma-separated name=value
pairs from a file, and then further split the value of the first field from its name. You could achieve that in many ways - including
awk -F, '{sub(".*=","",$1); print $1}' file
or a pure-bash solution such as
while IFS=, read -r a b c d; do printf '%s\n' "${a#*=}"; done < file
This is a good job for awk. You can split the string instead of attempting to use a regex. Here is a solution:
$ awk -F= '{ split($2,arr,","); print arr[1] }' test.txt
user1
[email protected]
usertest
abc1
With sed
:
sed 's/[^=]*=\([^,]\+\),.*/\1/' file
This assumes the uid=
will have the first occurrence of =
on the line and it assumes that you want to stop at the first ,ou=
instance on the line.
Explanation
This looks for any number of non =
characters ([^=]*
) followed by =
then matches and saves as many non-commas as it can find ( \([^,]\+\)
) followed by a comma and the rest of the line (,.*
). This means it will replace everything up to and including the first =
and after the first comma with whatever non-comma characters it finds after the first =
on the line.