Debugging lock-up - systemd loses my logs
So I asked on the #systemd IRC channel and it turns out that journald (the logging daemon of systemd) does not periodically flush the logs to disk at all. This means that your logs are always at risk at any time.
Sending SIGUSR2
to the journald
causes logs to be written to disk, but if you do this multiple times, many files will be created. (the option is actually described as "log rotating").
In the end, I decided to go with another suggestion: using a dedicated syslog daemon for collecting kernel logs. As rsyslog was suggested (and I had already experience with it), I explored that option further. I have written some more details in the Arch Wiki about using rsyslog.
The idea is to run rsyslog, collecting only data from the kernel facility. As rsyslog reads from /proc/kmsg
(which allows only a single reader) and journald reads from /dev/kmsg
(multiple readers allowed), there is no way that the daemons lose logs (very important to me!). Configure rsyslog to write kernel messages to a file and make sure that this file is rotated to prevent eating your disk space.
This solution is not perfect:
- Other logs (for example, from NetworkManager) are lost. This could be solved by forwarding more logs from syslog to journald (this means duplication!)
- Duplication of logs. The kernel messages are written to two files. This is a non-issue, in general the number of logs are small and you would rather have more copies of the logs than none. You can also use fast tools like
grep
on the single log file or the more slower, but fancierjournalctl
.
There is a TODO item for flushing logs more frequently, but that is still not reliable enough:
journal: send out marker messages every now and then, and immediately sync with fdatasync() afterwards, in order to have hourly guaranteed syncs.
Now, hopefully systemd/journald will get an option to write the logs to disk, but meanwhile we can combine tools to achieve the goal.
There are two updates:
- Now, hopefully systemd/journald will get an option to write the logs to disk, but meanwhile we can combine tools to achieve the goal.
There is an option --sync
:
Asks the journal daemon to write all yet unwritten journal data to the backing file system and synchronize all journals. This call does not return until the synchronization operation is complete. This command guarantees that any log messages written before its invocation are safely stored on disk at the time it returns.
--sync
available since v228
:
journalctl gained a new "--sync" switch that asks the journal daemon to write all so far unwritten log messages to disk and sync the files, before returning.
- It turns out that journald (the logging daemon of systemd) does not periodically flush the logs to disk at all. This means that your logs are always at risk at any time.
man journald.conf(5)
says:
SyncIntervalSec=
The timeout before synchronizing journal files to disk. After syncing, journal files are placed in the OFFLINE state. Note that syncing is unconditionally done immediately after a log message of priority CRIT, ALERT or EMERG has been logged. This setting hence applies only to messages of the levels ERR, WARNING, NOTICE, INFO, DEBUG. The default timeout is 5 minutes.
SyncIntervalSec=
available since v199
:
journald will now explicitly flush the journal files to disk at the latest 5min after each write. The file will then also be marked offline until the next write. This should increase reliability in case of a crash. The synchronization delay can be configured via SyncIntervalSec= in journald.conf.
See also:
journald: dispatch SIGTERM/SIGINT with a low priority
Let's make sure to process all queued log data before exiting, so that we don't unnecessary lose messages when shutting down.