Merge lines into single line
With awk
awk 'BEGIN{RS=">\n+";ORS=">\n";FS="\n"} {$1=$1}1' yourfile
< Jan 20, 2016 11:58:09 AM EST Test1 Sample Test1 >
< Jan 20, 2016 11:58:09 AM EST Sample Test It is not T1 T2 >
If you want a blank line between each output, you can add an extra \n
to the ORS
i.e.
awk 'BEGIN{RS=">\n+";ORS=">\n\n";FS="\n"} {$1=$1}1' yourfile
(although this may also add a trailing blank line at the end of the file).
Here you are:
For (GNU sed):
sed -e ':x' -e 'N' -e '$!bx' -e 's/\n/ /g' -e 's/ </\n</g' yourFile
For (BSD sed):
sed -e ':x' -e 'N' -e '$!bx' -e 's/\n/ /g' -e 's/ \</\'$'\n</g' yourFile
This is how I've done it:
- Create a label via
:x
- Append the lines to the pattern space with
N
- Branch to the created label - the
x
at the first of the command -$!bx
so it won't apply the space-associated substitution on the last line since we should keep the last newline - Then the substitution replaces every newline with a space(
) on the pattern space
- And then another substitution replaces every
<
followed by a space with a newline followed by a<
.
Looks like in effect, you want to remove all the newline characters except those that follow a >
, so:
perl -pe 's/(?<!>)\n//'
would do. (?<!...)
is a negative look behind operator. So, it's \n
provided it's not preceded with a >
.
If it's to remove all newline characters that are between matching <...>
pairs and, as per your new sample, those may nest, then that becomes more complicated:
perl -0777 -pe 's{<(?:(?0)|[^<>])*>}{$& =~ s/\n//gr}gse'
Here using recursion in perl regexps ((?0)
refers to the whole regexp again).