Emptying a file without disrupting the pipe writing to it
Another form of this problem occurs with long running applications whose logs are periodically rotated. Even if you move the original log (e.g., mv log.txt log.1
) and replace it immediately with a file of the same name before any actual logging occurs, if the process is holding the file open, it will either end up writing to log.1
(because that may still be the open inode) or to nothing.
A common way to deal with this (the system logger itself works this way) is to implement a signal handler in the process which will close and reopen its logs. Then, when ever you want to move or clear (by deleting) the log, send that signal to the process immediately afterward.
Here's a simple demonstration for bash -- forgive my cruddy shell skills (but if you are going to edit this for best practices, etc., please make sure you understand the functionality first and test your revision before you edit):
#!/bin/bash
trap sighandler INT
function sighandler () {
touch log.txt
exec &> log.txt
}
echo $BASHPID
exec &> log.txt
count=0;
while [ $count -lt 60 ]; do
echo "$BASHPID Count is now $count"
sleep 2
((count++))
done
Start this by forking into the background:
> ./test.sh &
12356
Notice it reports its PID to the terminal and then begins logging to log.txt
.
You now have 2 minutes to play around. Wait a few seconds and try:
> mv log.txt log.1 && kill -s 2 12356
Just plain kill -2 12356
may work for you here too. Signal 2 is SIGINT (it's also what Ctrl-C does, so you could try this in the foreground and move or remove the logfile from another terminal), which the trap
should trap. To check;
> cat log.1
12356 Count is now 0
12356 Count is now 1
12356 Count is now 2
12356 Count is now 3
12356 Count is now 4
12356 Count is now 5
12356 Count is now 6
12356 Count is now 7
12356 Count is now 8
12356 Count is now 9
12356 Count is now 10
12356 Count is now 11
12356 Count is now 12
12356 Count is now 13
12356 Count is now 14
Now let's see if it is still writing to a log.txt
even though we moved it:
> cat log.txt
12356 Count is now 15
12356 Count is now 16
12356 Count is now 17
12356 Count is now 18
12356 Count is now 19
12356 Count is now 20
12356 Count is now 21
Notice it kept going right where it left off. If you don't want to keep the record simply clear the log by deleting it
> rm -f log.txt && kill -s 2 12356
Check:
> cat log.txt
12356 Count is now 29
12356 Count is now 30
12356 Count is now 31
12356 Count is now 32
12356 Count is now 33
12356 Count is now 34
12356 Count is now 35
12356 Count is now 36
Still going.
You can't do this in a shell script for an executed subprocess, unfortunately, because if it is in the foreground, bash's own signal handlers (trap
s) are suspended, and if you fork it into the background, you can't reassign its output. I.e., this is something you have to implement in your application.
However...
If you can't modify the application (e.g., because you did not write it), I have a CLI utility you can use as an intermediary. You could also implement a simple version of this in a script which serves as a pipe to the log:
#!/bin/bash
trap sighandler INT
function sighandler () {
touch log.txt
exec 1> log.txt
}
echo "$0 $BASHPID"
exec 1> log.txt
count=0;
while read; do
echo $REPLY
done
Let's call this pipetrap.sh
. Now we need a separate program to test with, mimicking the application you want to log:
#!/bin/bash
count=0
while [ $count -lt 60 ]; do
echo "$BASHPID Count is now $count"
sleep 2
((count++))
done
That will be test.sh
:
> (./test.sh | ./pipetrap.sh) &
./pipetrap.sh 15859
These are two separate processes with separate PIDs. To clear test.sh
's output, which is being funnelled through pipetrap.sh
:
> rm -f log.txt && kill -s 2 15859
Check:
>cat log.txt
15858 Count is now 6
15858 Count is now 7
15858 Count is now 8
15858, test.sh
, is still running and its output is being logged. In this case, no modifications to the application are needed.
TL;DR
Open your log file in append mode:
cmd >> log
Then, you can safely truncate it with:
: > log
Details
With a Bourne-like shell, there are 3 main ways a file can be open for writing. In write-only (>
), read+write (<>
) or append (and write-only, >>
) mode .
In the first two, the kernel remembers the current position you (by you, I mean, the open file description, shared by all the file descriptors that have duplicated or inherited it by forking from the one you opened the file on) are into the file.
When you do:
cmd > log
log
is open in write-only mode by the shell for the stdout of cmd
.
cmd
(its initial process spawned by the shell and all the possible children) when writing to their stdout, write at the current cursor position held by the open file description they share on that file.
For instance, if cmd
initially writes zzz
, the position will be at byte offset 4 into the file, and the next time cmd
or its children write to the file, that's where the data will be written regardless of whether the file has grown or shrunk in the interval.
If the file has shrunk, for instance if it has been truncated with a
: > log
and cmd
writes xx
, those xx
will be written at offset 4
, and the first 3 characters will be replaced by NUL characters.
$ exec 3> log # open file on fd 3.
$ printf zzz >&3
$ od -c log
0000000 z z z
0000003
$ printf aaaa >> log # other open file description -> different cursor
$ od -c log
0000000 z z z a a a a
0000007
$ printf bb >&3 # still write at the original position
$ od -c log
0000000 z z z b b a a
0000007
$ : > log
$ wc log
0 0 0 log
$ printf x >&3
$ od -c log
0000000 \0 \0 \0 \0 \0 x
0000006
That means you cannot truncate a file that has been open in write-only mode (and that's the same for read+write) as if you do, processes that had file descriptors open on the file, will leave NUL characters at the beginning of the file (those, except on OS/X, usually don't take space on disk though, they become sparse files).
Instead (and you'll notice most applications do that when they write to log files), you should open the file in append mode:
cmd >> log
or
: > log && cmd >> log
if you want to start on an empty file.
In append mode, all writes are made at the end of the file, regardless of where the last write was:
$ exec 4>> log
$ printf aa >&4
$ printf x >> log
$ printf bb >&4
$ od -c log
0000000 a a x b b
0000005
$ : > log
$ printf cc >&4
$ od -c log
0000000 c c
0000002
That's also safer as if two processes have open (in that way) the file by mistake (as for instance if you've started two instances of the same daemon), their output will not overwrite each other.
On recent versions of Linux, you can check the current position and whether a file descriptor has been open in append mode by looking at /proc/<pid>/fdinfo/<fd>
:
$ cat /proc/self/fdinfo/4
pos: 2
flags: 0102001
Or with:
$ lsof +f G -p "$$" -ad 4
COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME
zsh 4870 root 4w REG 0x8401;0x0 252,18 2 59431479 /home/chazelas/log
~# lsof +f g -p "$$" -ad 4
COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME
zsh 4870 root 4w REG W,AP,LG 252,18 2 59431479 /home/chazelas/log
Those flags correspond to the O..._ flags passed to the open
system call.
$ gcc -E - <<< $'#include <fcntl.h>\nO_APPEND O_WRONLY' | tail -n1
02000 01
(O_APPEND
is 0x400 or octal 02000)
So the shell's >>
opens the file with O_WRONLY|O_APPEND
(and 0100000 here is O_LARGEFILE which is not relevant to this question) while >
is O_WRONLY
only (and <>
is O_RDWR
only).
If you do a:
sudo lsof -nP +f g | grep ,AP
to search for files open with O_APPEND
, you'll find most log files currently open for writing on your system.