How do I use swap space for emergencies only?
One fix is to make sure the memory cgroup controller is enabled (I think it is by default in even half-recent kernels, otherwise you'll need to add cgroup_enable=memory
to the kernel command line). Then you can run your I/O intensive task in a cgroup with a memory limit, which also limits the amount of cache it can consume.
If you're using systemd, you can set +MemoryAccounting=yes
and either MemoryHigh
/MemoryMax
or MemoryLimit
(depeneds on if you're using cgroup v1 or v2) in the unit, or a slice containing it. If its a slice, you can use systemd-run
to run the program in the slice.
Full example from one of my systems for running Firefox with a memory limit. Note this uses cgroups v2 and is set up as my user, not root (one of the advantages of v2 over v1 is that delegating this to non-root is safe, so systemd does it).
$ systemctl --user cat mozilla.slice
# /home/anthony/.config/systemd/user/mozilla.slice
[Unit]
Description=Slice for Mozilla apps
Before=slices.target
[Slice]
MemoryAccounting=yes
MemoryHigh=5G
MemoryMax=6G
$ systemd-run --user --slice mozilla.slice --scope -- /usr/bin/firefox &
$ systemd-run --user --slice mozilla.slice --scope -- /usr/bin/thunderbird &
I found to get the user one working I had to use a slice. System one works just by putting the options in the service file (or using systemctl set-property
on the service).
Here is an example service (using cgroup v1), note the last two lines. This is part of the system (pid=1) instance.
[Unit]
Description=mount S3QL filesystem
Requires=network-online.target
After=network-online.target
[Install]
WantedBy=multi-user.target
[Service]
Type=forking
User=s3ql-user
Group=s3ql-user
LimitNOFILE=20000
ExecStartPre=+/bin/sh -c 'printf "S3QL_CACHE_SIZE=%%i\n" $(stat -c "%%a*%%S*.90/1024" -f /srv/s3ql-cache/ | bc) > /run/local-s3ql-env'
ExecStartPre=/usr/bin/fsck.s3ql --cachedir /srv/s3ql-cache/fs1 --authfile /etc/s3ql-authinfo --log none «REDACTED»
EnvironmentFile=-/run/local-s3ql-env
ExecStart=/usr/bin/mount.s3ql --keep-cache --cachedir /srv/s3ql-cache/fs1 --authfile /etc/s3ql-authinfo --cachesize ${S3QL_CACHE_SIZE} --threads 4
ExecStop=/usr/bin/umount.s3ql /mnt/S3QL/
TimeoutStopSec=2m
MemoryAccounting=yes
MemoryLimit=1G
Documentation is in systemd.resource-control(5)
.
It seems that after a day of inactivity the kernel believes the entire GUI is no longer needed and wipes it from RAM (swaps it to disk).
The kernel is doing The Right Thing™ believing it. Why would it keep unused1 memory in RAM and so essentially waste it instead of using it as cache or something?
I don't think the Linux kernel is gratuitously or anticipatory swapping out pages, so if it does it that must be to store something else on RAM, thus improving performance of your long running task, or at least with this goal.
If you know when you'll need to reuse your laptop in advance, you might use the at
command (or crontab
) to schedule a swap cleanup (swapoff -a;swapon -a
).
As cleaning the swap might be overkill, and even trigger the OOM killer if for some reason, not everything fit in RAM, you might just "unswap"2 everything related to the running applications you want to revive.
One way to do it would be to attach a debugger like gdb
to each of the affected processes and trigger a core dump generation:
# gdb -p <pid>
...
generate-core-dump /dev/null
...
quit
As you wrote, your long running application is not reusing the data it reads after the initial pass, so you are in a specific case where long term caching is not useful. Then bypassing the cache by using direct I/O like suggested by Will Crawford should be a good workaround.
Alternatively, you might just regularly flush the file cache by echoing 1
or 3
to the /proc/sys/vm/drop_caches
pseudo-file before the OS thinks it's a good idea to swap out your GUI applications and environment.
See How do you empty the buffers and cache on a Linux system? for details.
1Unused in the sense: no more actively used since a significant period of time, the memory still being relevant to its owners.
2Put back in RAM pages stored on the swap area.
Having such a huge swap nowadays is often a bad idea. By the time the OS swapped just a few GB of memory to swap, your system had already crawled to death (like what you saw)
It's better to use zram
with a small backup swap partition. Many OSes like ChromeOS, Android and various Linux distros (Lubuntu, Fedora) have enabled zram by default for years, especially for systems with less RAM. It's much faster than swap on HDD and you can clearly feel the system responsiveness in this case. Less so on an SSD, but according to the benchmark results here it still seems faster even with the default lzo algorithm. You can change to lz4 for even better performance with a little bit less compression ratio. It's decoding speed is nearly 5 times faster than lzo based on official benchmark
In fact Windows 10 and macOS also use similar pagefile compression techniques by default
There's also zswap
although I've never used it. Probably worth a try and compare which one is better for your usecases
After that another suggestion is to reduce the priority of those IO-bound processes and possibly leave a terminal running on higher priority so that you can run commands on it right away even when the system is on a high load
Further reading
- Arch Linux - Improving performance - Zram or zswap
- Enable ZSwap to increase performance
- Enable zRAM for improved memory handling and less swapping
- Running out of RAM in Ubuntu? Enable ZRAM
- Difference between ZRAM and ZSWAP
- zram vs zswap vs zcache Ultimate guide: when to use which one
- Linux, SSD and swap
- https://wiki.debian.org/ZRam
- https://www.kernel.org/doc/Documentation/blockdev/zram.txt
- https://wiki.gentoo.org/wiki/Zram