Is there a way to modify a file in-place?
At the system call level this should be possible. A program can open your target file for writing without truncating it and start writing what it reads from stdin. When reading EOF, the output file can be truncated.
Since you are filtering lines from the input, the output file write position should always be less than the read position. This means you should not corrupt your input with the new output.
However, finding a program that does this is the problem. dd(1)
has the option conv=notrunc
that does not truncate the output file on open, but it also does not truncate at the end, leaving the original file contents after the grep contents (with a command like grep pattern bigfile | dd of=bigfile conv=notrunc
)
Since it is very simple from a system call perspective, I wrote a small program and tested it on a small (1MiB) full loopback filesystem. It did what you wanted, but you really want to test this with some other files first. It's always going to be risky overwriting a file.
overwrite.c
/* This code is placed in the public domain by camh */
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
int main(int argc, char **argv)
{
int outfd;
char buf[1024];
int nread;
off_t file_length;
if (argc != 2) {
fprintf(stderr, "usage: %s <output_file>\n", argv[0]);
exit(1);
}
if ((outfd = open(argv[1], O_WRONLY)) == -1) {
perror("Could not open output file");
exit(2);
}
while ((nread = read(0, buf, sizeof(buf))) > 0) {
if (write(outfd, buf, nread) == -1) {
perror("Could not write to output file");
exit(4);
}
}
if (nread == -1) {
perror("Could not read from stdin");
exit(3);
}
if ((file_length = lseek(outfd, 0, SEEK_CUR)) == (off_t)-1) {
perror("Could not get file position");
exit(5);
}
if (ftruncate(outfd, file_length) == -1) {
perror("Could not truncate file");
exit(6);
}
close(outfd);
exit(0);
}
You would use it as:
grep pattern bigfile | overwrite bigfile
I'm mostly posting this for others to comment on before you try it. Perhaps someone else knows of a program that does something similar that is more tested.
With any Bourne-like shell:
{
cat < bigfile | grep -v to-exclude
perl -e 'truncate STDOUT, tell STDOUT'
} 1<> bigfile
For some reason, it seems people tend to forget about that 40 year old¹ and standard read+write redirection operator.
We open bigfile
in read+write mode and (what matters most here) without truncation on stdout
while bigfile
is open (separately) on cat
's stdin
. After grep
has terminated, and if it has removed some lines, stdout
now points somewhere within bigfile
, we need to get rid of what's beyond this point. Hence the perl
command that truncates the file (truncate STDOUT
) at the current position (as returned by tell STDOUT
).
(the cat
is for GNU grep
that otherwise complains if stdin and stdout point to the same file).
¹ Well, while <>
has been in the Bourne shell from the start in the late seventies, it was initially undocumented and not properly implemented. It was not in the original implementation of ash
from 1989 and, while it is a POSIX sh
redirection operator (since the early 90s as POSIX sh
is based on ksh88
which always had it), it was not added to FreeBSD sh
for instance until 2000, so portably 15 year old is probably more accurate. Also note that the default file descriptor when not specified is <>
in all shells, except that in ksh93
it changed from 0 to 1 in ksh93t+ in 2010 (breaking backward compatibility and POSIX compliance)
You can use sed
to edit files in place (but this does create an intermediate temporary file):
To remove all lines containing foo
:
sed -i '/foo/d' myfile
To keep all lines containing foo
:
sed -i '/foo/!d' myfile