Out of memory while using sed with multiline expressions on giant file
Your first three commands are the culprit:
:a
N
$!ba
This reads the entire file into memory at once. The following script should only keep one segment in memory at a time:
% cat test.sed
#!/usr/bin/sed -nf
# Append this line to the hold space.
# To avoid an extra newline at the start, replace instead of append.
1h
1!H
# If we find a paren at the end...
/)$/{
# Bring the hold space into the pattern space
g
# Remove the newlines
s/\n//g
# Print what we have
p
# Delete the hold space
s/.*//
h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()
This awk solution will print each line as it comes, so it will only have a single line in memory at a time:
% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()