How to cut a file starting from the line in which a certain pattern occurs?
You should be able to do it by just truncating the file in place without having to write a new copy of the file like sed -i
/perl -i
/ed
/gawk -i inplace
would do. With perl
:
find . -name '*.txt' -type f -exec perl -ne '
BEGIN{@ARGV=map{"+<$_"}@ARGV} # open files in read+write mode in the
# while(<>) loop implied by -n
if (/END DATA/) {
seek ARGV,-length,1; # back to beginning of matching line
print ARGV "NEW END\n";
truncate ARGV, tell ARGV;
close ARGV; # skip to next file
}' {} +
That minimises the I/O as perl
stops reading as soon as it finds a match, and NEW END\n
is the only thing it writes. It also writes in place, so the files metadata (ownership, permission, acls, sparseness...) are preserved and hard links are not broken.
With -exec {} +
we also minimise the number of perl
invocations.
It sounds like the sequence of commands you're looking for is
/END DATA/,$d
q
.a
NEW END
.
wq
or as a one-liner
printf '%s\n' '/END DATA/,$d' 'q' '.a' 'NEW END' '.' 'wq'
(You can replace wq
with ,p
for testing.)
Ex. given
$ cat file
Data 1
Data 2
something_unimportant_here END DATA
Rubbish 1
Rubbish 2
then
$ printf '%s\n' '/END DATA/,$d' 'q' '.a' 'NEW END' '.' 'wq' | ed -s file
gives
$ cat file
Data 1
Data 2
NEW END
With GNU grep
and GNU sed
grep -lZ 'END DATA' *.txt | xargs -0 sed -i -e '/END DATA/,${//i foo' -e 'd}'
where *.txt
assumes all your files are in current directory ending with .txt
extension. If you need to recursively search for files, GNU grep
also supports -r/-R
options.
/END DATA/,$
range of lines to operate
//i foo
here //
will match the previously used regex, i.e. /END DATA/
and i
command will add the new ending marker as needed
as i
command has to be separated by newline, -e
option is used to separate the d
command to delete all lines matched by the range
as an alternate, you can also use this, but only one file will be passed at a time to sed
:
grep -lZ 'END DATA' *.txt | xargs -0 -n1 sed -i -e '/END DATA/{i foo' -e 'Q}'