Transparent compression filesystem in conjunction with ext4
Solution 1:
I use ZFS on Linux as a volume manager and a means to provide additional protections and functionality to traditional filesystems. This includes bringing block-level snapshots, replication, deduplication, compression and advanced caching to the XFS or ext4 filesystems.
See: https://pthree.org/2012/12/21/zfs-administration-part-xiv-zvols/ for another explanation.
In my most common use case, I leverage the ZFS zvol feature to create a sparse volume on an existing zpool. That zvol's properties can be set just like a normal ZFS filesystem's. At this juncture, you can set properties like compression type, volume size, caching method, etc.
Creating this zvol presents a block device to Linux that can be formatted with the filesystem of your choice. Use fdisk
or parted
to create your partition and mkfs
the finished volume.
Mount this and you essentially have a filesystem backed by a zvol and with all of its properties.
Here's my workflow...
Create a zpool comprised of four disks:
You'll want the ashift=12
directive for the type of disks you're using. The zpool name is "vol0" in this case.
zpool create -o ashift=12 -f vol0 mirror scsi-AccOW140403AS1322043 scsi-AccOW140403AS1322042 mirror scsi-AccOW140403AS1322013 scsi-AccOW140403AS1322044
Set initial zpool settings:
I set autoexpand=on
at the zpool level in case I ever replace the disks with larger drives or expand the pool in a ZFS mirrors setup. I typically don't use ZFS raidz1/2/3 because of poor performance and the inability to expand the zpool.
zpool set autoexpand=on vol0
Set initial zfs filesystem properties:
Please use the lz4
compression algorithm for new ZFS installations. It's okay to leave it on all the time.
zfs set compression=lz4 vol0
zfs set atime=off vol0
Create ZFS zvol:
For ZFS on Linux, it's very important that you use a large block size. -o volblocksize=128k
is absolutely essential here. The -s
option creates a sparse zvol and doesn't consume pool space until it's needed. You can overcommit here, if you know your data well. In this case, I have about 444GB of usable disk space in the pool, but I'm presenting an 800GB volume to XFS.
zfs create -o volblocksize=128K -s -V 800G vol0/pprovol
Partition zvol device:
(should be /dev/zd0 for the first zvol; /dev/zd16, /dev/zd32, etc. for subsequent zvols)
fdisk /dev/zd0 # (create new aligned partition with the "c" and "u" parameters)
Create and mount the filesystem:
mkfs.xfs or ext4 on the newly created partition, /dev/zd0p1.
mkfs.xfs -f -l size=256m,version=2 -s size=4096 /dev/zd0p1
Grab the UUID with blkid
and modify /etc/fstab
.
UUID=455cae52-89e0-4fb3-a896-8f597a1ea402 /ppro xfs noatime,logbufs=8,logbsize=256k 1 2
Mount the new filesystem.
mount /ppro/
Results...
[root@Testa ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sde2 20G 8.9G 9.9G 48% /
tmpfs 32G 0 32G 0% /dev/shm
/dev/sde1 485M 63M 397M 14% /boot
/dev/sde7 2.0G 68M 1.9G 4% /tmp
/dev/sde3 12G 2.6G 8.7G 24% /usr
/dev/sde6 6.0G 907M 4.8G 16% /var
/dev/zd0p1 800G 398G 403G 50% /ppro <-- Compressed ZFS-backed XFS filesystem.
vol0 110G 256K 110G 1% /vol0
ZFS filesystem listing.
[root@Testa ~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
vol0 328G 109G 272K /vol0
vol0/pprovol 326G 109G 186G - <-- The actual zvol providing the backing for XFS.
vol1 183G 817G 136K /vol1
vol1/images 183G 817G 183G /images
ZFS zpool list.
[root@Testa ~]# zpool list -v
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
vol0 444G 328G 116G 73% 1.00x ONLINE -
mirror 222G 164G 58.1G -
scsi-AccOW140403AS1322043 - - - -
scsi-AccOW140403AS1322042 - - - -
mirror 222G 164G 58.1G -
scsi-AccOW140403AS1322013 - - - -
scsi-AccOW140403AS1322044 - - - -
ZFS zvol properties (take note of referenced
, compressratio
and volsize
).
[root@Testa ~]# zfs get all vol0/pprovol
NAME PROPERTY VALUE SOURCE
vol0/pprovol type volume -
vol0/pprovol creation Sun May 11 15:27 2014 -
vol0/pprovol used 326G -
vol0/pprovol available 109G -
vol0/pprovol referenced 186G -
vol0/pprovol compressratio 2.99x -
vol0/pprovol reservation none default
vol0/pprovol volsize 800G local
vol0/pprovol volblocksize 128K -
vol0/pprovol checksum on default
vol0/pprovol compression lz4 inherited from vol0
vol0/pprovol readonly off default
vol0/pprovol copies 1 default
vol0/pprovol refreservation none default
vol0/pprovol primarycache all default
vol0/pprovol secondarycache all default
vol0/pprovol usedbysnapshots 140G -
vol0/pprovol usedbydataset 186G -
vol0/pprovol usedbychildren 0 -
vol0/pprovol usedbyrefreservation 0 -
vol0/pprovol logbias latency default
vol0/pprovol dedup off default
vol0/pprovol mlslabel none default
vol0/pprovol sync standard default
vol0/pprovol refcompressratio 3.32x -
vol0/pprovol written 210M -
vol0/pprovol snapdev hidden default
Solution 2:
You also need to enable discard on the ext4 filesystem. Without discard, zfs does not reclaim the space when files are removed. This can end up leading to large space discrepancies between what the ext4 filesystem reports and the zfs volume reports.