Grouped sorting of continuous paragraphs (separated by blank line)?
awk -v RS= -v cmd=sort '{print | cmd; close(cmd); print ""}' file
Setting the record separator RS
to an empty string makes awk step in paragraphs at a time.
For each paragraph, pipe the paragraph (in $0
) to cmd (which is set to sort
) and print the output. Print out a blank line to separate the output paragraphs with a print ""
.
If we're giving perl examples, then I present an alternative approach than that of Stephane's:
perl -e 'undef $/; print join "\n", sort (split /\n/), "\n"
foreach(split(/\n\n/, <>))' < file
Unset the field separator (undef $/
), this allows us to use <>
and get the whole of STDIN. We then split
that around \n\n
(paragraphs). foreach
"paragraph", sort
the lines by split
ting around newlines, sort
ing and then join
ing them back together and tacking on a trailing \n
.
However, this has one side effect of adding a "trailing paragraph" separator on the last paragraph (if it didn't have one before). You can get around that with the slightly less pretty:
perl -e 'undef $/; print join "\n", sort (split /\n/) , (\$_ == \$list[-1] ? "" : "\n")
foreach(@list = split(/\n\n/, <>))' < file
This assigns the paragraphs to @list
, and then there is a "ternary operation" to check if it is the last element of the foreach
(the \$_ == \$list[-1]
check). print ""
if it is (? ...
), else (: ...
) print "\n"
for all other "paragraphs" (elements of @list
).
Drav's awk
solution is good, but that means running one sort
command per paragraph. To avoid that, you could do:
< file awk -v n=0 '!NF{n++};{print n,$0}' | sort -k1n -k2 | cut -d' ' -f2-
Or you could do the whole thing in perl
:
perl -ne 'if (/\S/){push@l,$_}else{print sort@l if@l;@l=();print}
END{print sort @l if @l}' < file
Note that above, separators are blank lines (for the awk
one, lines with only space or tab characters, for the perl
one, any horizontal or vertical spacing character) instead of empty lines. If you do want empty lines, you can replace !NF
with !length
or $0==""
, and /\S/
with /./
.
I wrote a tool in haskell that allows you to use sort, shuf, tac or any other command on paragraphs of text.
https://gist.github.com/siers/01306a361c22f2de0122
EDIT: the tool is also included in this repo: https://github.com/siers/haskell-import-sort
It splits the text into blocks, joins the subblocks with \0
char, pipes through the command and finally does the same thing in reverse.
28-08-2015: I found an other, personal use for this tool — selecting N paragraphs after a line.
paramap grep -aA2 '^reddit usernames' < ~/my-username-file
reddit usernames
foo
bar
baz
a couple
more of these