ZFS vs XFS
Solution 1:
ZFS will give you advantages beyond software RAID. The command structure is very thoughtfully laid out, and intuitive. It's also got compression, snapshots, cloning, filesystem send/receive, and cache devices (those fancy new SSD drives) to speed up indexing meta-data.
Compression:
#zfs set compression=on filesystem/home
It supports simple to create copy-on-write snapshots that can be live-mounted:
# zfs snapshot filesystem/home/user@tuesday
# cd filesystem/home/user/.zfs/snapshot/tuesday
Filesystem cloning:
# zfs clone filesystem/home/user@tuesday filesystem/home/user2
Filesystem send/receive:
# zfs send filesystem/home/user@tuesday | ssh otherserver "zfs receive -v filesystem/home/user"
Incremental send/receive:
# zfs send -i filesystem/home/user@tuesday | ssh otherserver "zfs receive -v filesystem/home/user"
Caching devices:
# zpool add filesystem cache ssddev
This is all just the tip of the iceberg, I would highly recommend getting your hands on an install of Open Solaris and trying this out.
http://www.opensolaris.org/os/TryOpenSolaris/
Edit: This is very old, Open Solaris has been discontinued, the best way to use ZFS is probably on Linux, or FreeBSD.
Full disclosure: I used to be a Sun storage architect, but I haven't worked for them in over a year, I'm just excited about this product.
Solution 2:
I've found XFS more well suited to extremely large filesystems with possibly many large files. I've had a functioning 3.6TB XFS filesystem for over 2 years now with no problems. Definately works better than ext3, etc at that size (especially when dealing with many large files and lots of I/O).
What you get with ZFS is device pooling, striping and other advanced features built into the filesystem itself. I can't speak to specifics (I'll let others comment), but from what I can tell, you'd want to use Solaris to get the most benefit here. It's also unclear to me how much ZFS helps if you're already using hardware RAID (as I am).
Solution 3:
using lvm snapshots and xfs on live filesystems is a recipe for disaster especially when using very large filesystems.
I've been running exclusively on LVM2 and xfs for the last 6 years on my servers (at home even since zfs-fuse is just plain too slow)...
However, I can no longer count the different failure modes I encountered when using snapshots. I've stopped using them altogether - it's just too dangerous.
The only exception I'll make now is my own personal mailserver/webserver backup, where I'll do overnight backups using an ephemeral snapshot, that is always equal the size of the source fs, and gets deleted right afterwards.
Most important aspects to keep in mind:
- if you have a big(ish) filesystem that has a snapshot, write performance is horribly degraded
- if you have a big(ish) filesystem that has a snapshot, boot time will be delayed with literally tens of minutes while the disk will be churning and churning during import of the volume group. No messages will be displayed. This effect is especially horrid if root is on lvm2 (because waiting for the root device will timeout and system doesn't boot)
- if you have a snapshot it is very easy to run out of space. Once you run out of space, the snapshot is corrupt and cannot be repaired.
- Snapshots cannot be rolledback/merged at the moment (see http://kerneltrap.org/Linux/LVM_Snapshot_Merging). This means the only way to restore data from a snapshot is to actually copy (rsync?) it over. DANGER DANGER: you do not want to do this if the snapshot capacity is not at least the size of the source fs; If you don't you'll soon hit the brick wall and end up with both the source fs and the snapshot corrupted. (I've been there!)
Solution 4:
A couple additional things to think about.
If a drive dies in a hardware RAID array regardless of the filesystem that's on top of it all the blocks on the device have to be rebuilt. Even the ones that didn't hold any data. ZFS on the other hand is the volume manager, the filesystem, and manages data redundancy and striping. So it can intelligently rebuild only the blocks that contained data. This results in faster rebuild times other than when the volume is 100% full.
ZFS has background scrubbing which makes sure that your data stays consistent on disk and repairs any issues it finds before it results in data loss.
ZFS file systems are always in a consistent state so there is no need for fsck.
ZFS also offers more flexibility and features with it's snapshots and clones compared to the snapshots offered by LVM.
Having run large storage pools for large format video production on a Linux, LVM, XFS stack. My experience has been that it's easy to fall into micro-managing your storage. This can result in large amounts of unused allocated space and time/issues with managing your Logical Volumes. This may not be a big deal if you have a full time storage administrator who's job is to micro-manage the storage. But I've found that ZFS's pool storage approach removes these management issues.
Solution 5:
ZFS is absolutely amazing. I am using it as my home file server for a 5 x 1 TB HD file server, and am also using it in production with almost 32 TB of hard drive space. It is fast, easy to use and contains some of the best protection against data corruption.
We are using OpenSolaris on this server in particular because we wanted to have access to newer features and because it provided the new package management system and way of upgrading.