Is it possible to make a dynamically risizing filesystem-as-file?

Actually, something like it is already possible

A filesystem needs to have a defined maximum size. But as the filesystem-as-file can be sparse, that size can be an arbitrary number which doesn't have much to do with how much space the filesystem-as-file takes up on the underlying filesystem.

If you can accept setting an arbitrary maximum size limit (which can be much greater than the actual size of the underlying filesystem) for the filesystem-as-file, you can create a sparse file and a filesystem on it right now:

/tmp# df -h .
Filesystem                Size  Used Avail Use% Mounted on
<current filesystem>       20G   16G  3.0G  84% /

/tmp# dd if=/dev/null bs=1 seek=1024000000000 of=testdummy
0+0 records in
0+0 records out
0 bytes copied, 0.000159622 s, 0.0 kB/s
/tmp# ll testdummy
-rw-r--r-- 1 root root 1024000000000 Feb 19 08:24 testdummy
/tmp# ll -h testdummy
-rw-r--r-- 1 root root 954G Feb 19 08:24 testdummy

Here, I've created a file that appears to be a whole lot bigger than the filesystem it's stored onto...

/tmp# du -k testdummy
0       testdummy

...but so far it does not actually take any disk space at all (except for the inode and maybe some other metadata).

It would be perfectly possible to losetup it, create a filesystem on it and start using it. Each write operation that actually writes data to the file would cause the file's space requirement to grow. In other words, while the file size as reported by ls -l would stay that arbitrary huge number all the time, the actual space taken by the file on the disk as reported by du would grow.

And if you mount the filesystem-as-file with a discard mount option, shrinking can work automatically too:

/tmp# losetup /dev/loop0 testdummy
/tmp# mkfs.ext4 /dev/loop0
/tmp# mount -o discard /dev/loop0 /mnt
/tmp# du -k testdummy 
1063940 testdummy
/tmp# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0      938G   77M  890G   1% /mnt

/tmp# cp /boot/initrd.img /mnt
/tmp# du -k testdummy 
1093732 testdummy

/tmp# rm /mnt/initrd.img
/tmp# du -k testdummy
1063944 testdummy

Automatic shrinking requires:

1.) that the filesystem type of the filesystem-as-file supports the discard mount option (so that the filesystem driver can tell the underlying system which blocks can be deallocated)

2.) and that the filesystem type of the underlying filesystem supports "hole punching", i.e. the fallocate(2) system call with the FALLOC_FL_PUNCH_HOLE option (so that the underlying filesystem can be told to mark some of the previously-allocated blocks of the filesystem-as-file as sparse blocks again)

3.) and that you're using kernel version 3.2 or above, so that the loop device support has the necessary infrastructure for this.

https://outflux.net/blog/archives/2012/02/15/discard-hole-punching-and-trim/

If you're fine with less immediate shrinking, you could just periodically run fstrim on the filesystem-as-file instead of using the discard mount option. If the underlying filesystem is very busy, avoiding immediate shrinking might help minimizing the fragmentation of the underlying filesystem.

The problem with this approach is that if the underlying filesystem becomes full, it won't be handled very gracefully. If there is no longer space in the underlying filesystem, the filesystem-as-file will start receiving errors when it's trying to replace sparse "holes" with actual data, even as the filesystem-as-file would appear to have some unused capacity left.

Tags:

Filesystems