Are files saved on disk sequentially?
Can a file be saved not sequentially on disk? I mean, part of the file is located under physical address X and the other part under physical address Y which isn't close to X + offset).
Yes; this is known as file fragmentation and is not uncommon, especially with larger files. Most file systems allocate space as it's needed, more or less sequentially, but they can't guess future behaviour — so if you write 200MiB to a file, then add a further 100MiB, there's a non-zero chance that both sets of data will be stored in different areas of the disk (basically, any other write needing more space on disk, occurring after the first write and before the second, could come in between the two). If a filesystem is close to full, the situation will usually be worse: there may not be a contiguous area of free space large enough to hold a new file, so it will have to be fragmented.
Can I somehow control the file sequentiallity? I want to allocate big file of 10GB. I want it to be sequential in disk and not divided between different offsets.
You can tell the filesystem about your file's target size when it's created; this will help the filesystem store it optimally. Many modern filesystems use a technique known as delayed allocation, where the on-disk layout of a new file is calculated as late as possible, to maximise the information available when the calculation is performed. You can help this process by using the posix_fallocate(3)
function to tell the filesystem how much disk space should be allocated in total. Modern filesystems will try to perform this allocation sequentially.
Does it act differently between the different types?
Different filesystems behave differently, yes. Log-based filesystems such as NILFS2 don't allocate storage in the same way as extent-based filesystems such as Ext4, and that's just one example of variation.
The command filefrag
will tell you how your file is physically stored on your device:
# filefrag -v /var/log/messages.1
Filesystem type is: ef53
File size of /var/log/messages.1 is 41733 (11 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 2130567 1
1 1 15907576 2130568 1
2 2 15910400 15907577 1
3 3 15902720 15910401 7
4 10 2838546 15902727 1 eof
/var/log/messages.1: 5 extents found
If you write your file in one pass, my guess is that your file won't be fragmented.
The man page of fallocate
(1) is pretty clear :
fallocate
is used to preallocate blocks to a file. For filesystems which support thefallocate
system call, this is done quickly by allocating blocks and marking them as uninitialized, requiring no IO to the data blocks. This is much faster than creating a file by filling it with zeros.As of the Linux Kernel v2.6.31, the
fallocate
system call is supported by the btrfs, ext4, ocfs2, and xfs filesystems.
Is it sequential? The system will first try to allocate the blocks sequentially. If it can't, it will not warn you.
You mention sparse files, and none of the other answers have mentioned them.
Most files are not sparse. The most common way to create a file is to write it all in one go, from the start to the end. No holes there.
However, you are allowed to say "move to position 1,000,000,000,000 and write a byte there." This will create a file that looks like it is an etabyte big, but actually only uses (probably) 4k on disk. This is a sparse file.
You can do this many times for the same file, leaving small amounts of data scattered across the vast emptiness.
While this can be useful, there are two downsides.
The first is that the file will be fragmented, which is what you worried about.
The second is that not all programs handle these files well. E.g. some backup software will try to backup the emptiness and thereby create a backup which is much larger than necessary, possibly too big for the backup medium.