Extract number of length n from field and return string
If I understand correctly, you want the 5th column to become the concatenation with space of all the 6 digit numbers in it.
Maybe:
perl -F'\t' -lape '
$F[4] = join " ", grep {length == 6} ($F[4] =~ /\d+/g);
$_ = join "\t", @F' < file
Or reusing your negative look around operators:
perl -F'\t' -lape '
$F[4] = join " ", ($F[4] =~ /(?<!\d)\d{6}(?!\d)/g);
$_ = join "\t", @F' < file
With awk
:
awk -F'\t' -v OFS='\t' '
{
repl = sep = ""
while (match($5, /[0-9]+/)) {
if (RLENGTH == 6) {
repl = repl sep substr($5, RSTART, RLENGTH)
sep = " "
}
$5 = substr($5, RSTART+RLENGTH)
}
$5 = repl
print
}' < file
grep
itself is not very adequate for the task. grep
is meant to print the lines that match a pattern. Some implementations like GNU or ast-open grep
, or pcregrep
can extract strings from the matching lines, but that's quite limited.
The only cut
+grep
+paste
approach I can think of that could work with some restrictions would be with the pcregrep
grep
implementation:
n='(?:.*?((?1)))?'
paste <(< file cut -f1-4) <(< file cut -f5 |
pcregrep --om-separator=" " -o1 -o2 -o3 -o4 -o5 -o6 -o7 -o8 -o9 \
"((?<!\d)\d{6}(?!\d))$n$n$n$n$n$n$n$n"
) <(< file cut -f6-)
That assumes that every line of input has at least 6 fields and that the 5th field of each has in between 1 and 9 6-digit numbers.