Out of memory while using sed with multiline expressions on giant file

Your first three commands are the culprit:

:a
N
$!ba

This reads the entire file into memory at once. The following script should only keep one segment in memory at a time:

% cat test.sed
#!/usr/bin/sed -nf

# Append this line to the hold space. 
# To avoid an extra newline at the start, replace instead of append.
1h
1!H

# If we find a paren at the end...
/)$/{
    # Bring the hold space into the pattern space
    g
    # Remove the newlines
    s/\n//g 
    # Print what we have
    p
    # Delete the hold space
    s/.*//
    h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()

This awk solution will print each line as it comes, so it will only have a single line in memory at a time:

% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()

Out of memory while using sed with multiline expressions on giant file

Tags:

Out Of Memory

Sed

Large Files

Related

Recent Posts