Merge lines into single line

With awk

awk 'BEGIN{RS=">\n+";ORS=">\n";FS="\n"} {$1=$1}1' yourfile
< Jan 20, 2016 11:58:09 AM EST  Test1 Sample Test1 >
< Jan 20, 2016 11:58:09 AM EST Sample Test It is not  T1 T2 >

If you want a blank line between each output, you can add an extra \n to the ORS i.e.

awk 'BEGIN{RS=">\n+";ORS=">\n\n";FS="\n"} {$1=$1}1' yourfile

(although this may also add a trailing blank line at the end of the file).


Here you are:

For (GNU sed):

sed -e ':x' -e 'N' -e '$!bx' -e 's/\n/ /g' -e 's/ </\n</g' yourFile

For (BSD sed):

sed -e ':x' -e 'N' -e '$!bx' -e 's/\n/ /g' -e 's/ \</\'$'\n</g' yourFile 

This is how I've done it:

  • Create a label via :x
  • Append the lines to the pattern space with N
  • Branch to the created label - the x at the first of the command - $!bx so it won't apply the space-associated substitution on the last line since we should keep the last newline
  • Then the substitution replaces every newline with a space() on the pattern space
  • And then another substitution replaces every < followed by a space with a newline followed by a <.

Looks like in effect, you want to remove all the newline characters except those that follow a >, so:

perl -pe 's/(?<!>)\n//'

would do. (?<!...) is a negative look behind operator. So, it's \n provided it's not preceded with a >.

If it's to remove all newline characters that are between matching <...> pairs and, as per your new sample, those may nest, then that becomes more complicated:

perl -0777 -pe 's{<(?:(?0)|[^<>])*>}{$& =~ s/\n//gr}gse'

Here using recursion in perl regexps ((?0) refers to the whole regexp again).