How to parse a file to extract 3 digits numbers kept in a "group number"
awk '
$1 == "Group" {printf("\\section{%s%d}\n", $1, $2); next}
{for (i=1; i<=NF; i++)
if ($i ~ /^[0-9][0-9][0-9]$/) {
printf("\\Testdetails{%d}\n", $i)
break
}
}
'
Update based on comment:
awk '
$1 == "Group" {printf("\\section{%s %d}\n", $1, $2); next}
{
title = sep = ""
for (i=1; i<=NF; i++)
if ($i ~ /^[0-9][0-9][0-9]$/) {
printf("\\subsection{%s} \\Testdetails{%d}\n", title, $i)
break
}
else {
title = title sep $i
sep = FS
}
}
'
One way with perl
using regexp
and assuming infile
has the content you posted in the question.
Content of script.pl
:
use warnings;
use strict;
while ( <> ) {
chomp;
if ( m/\A\s*(Group)\s*(\d+)/ ) {
printf qq[\\Section{%s}\n], $1 . $2;
next;
}
if ( m/\s(\d{3})(?:\s|$)/ ) {
printf qq[\\Testdetails{%s}\n], $1;
}
}
Run it like:
perl script.pl infile
With following output:
\Section{Group0}
\Testdetails{101}
\Testdetails{102}
\Testdetails{412}
\Testdetails{206}
\Testdetails{207}
\Testdetails{201}
\Testdetails{202}
\Testdetails{408}
\Testdetails{101}
\Section{Group1}
\Testdetails{305}
\Testdetails{101}
\Testdetails{324}
\Testdetails{206}
\Testdetails{207}
\Testdetails{410}
\Testdetails{409}
\Testdetails{420}
\Testdetails{426}
\Testdetails{101}
\Section{Group2}
\Testdetails{409}
\Testdetails{305}
For completeness here is a sed
version:
sed -n -e 's#^ *Group \([0-9]\+\).*#\\Section{Group\1}#p' \
-e 's#.*\b\([0-9][0-9][0-9]\)\b.*#\\Testdetails{\1}#p'