How does stat command calculate the blocks of a file?
The stat
command-line tool uses the stat
/ fstat
etc. functions, which return data in the stat
structure. The st_blocks
member of the stat
structure returns:
The total number of physical blocks of size 512 bytes actually allocated on disk. This field is not defined for block special or character special files.
So for your "Email" example, with a size of 965 and a block count of 8, it is indicating that 8*512=4096 bytes are physically allocated on disk. The reason it's not 2 is that the file system on disk does not allocate space in units of 512, it evidently allocates them in units of 4096. (And the unit of allocation may vary depending on file size and filesystem sophistication. E.g. ZFS supports different units of allocation.)
Similarly, for the wxPython example, it indicates that 7056*512 bytes, or 3612672 bytes are physically allocated on disk. You get the idea.
The IO block size is "a hint as to the 'best' unit size for I/O operations" - it's usually the unit of allocation on the physical disk. Don't get confused between the IO block and the block that stat
uses to indicate physical size; the blocks for physical size are always 512 bytes.
Update based on comment:
Like I said, st_blocks
is how the OS indicates how much space is used by the file on disk. The actual units of allocation on disk are the choice of the file system. For example, ZFS can have allocation blocks of variable size, even in the same file, because of the way it allocates blocks: files start out having a small block size, and the block size keeps on increasing until it reaches a particular point. If the file is later truncated, it will probably keep the old block size. So based on the history of the file, it can have multiple possible block sizes. So given a file size it is not always obvious why it has a particular physical size.
Concrete example: on my Solaris box, with a ZFS file system, I can create a very short file:
$ echo foo > test
$ stat test
Size: 4 Blocks: 2 IO Block: 512 regular file
(irrelevant details omitted)
OK, small file, 2 blocks, physical disk usage is 1024 for this file.
$ dd if=/dev/zero of=test2 bs=8192 count=4
$ stat test2
Size: 32768 Blocks: 65 IO Block: 32768 regular file
OK, now we see physical disk usage of 32.5K, and an IO block size of 32K. I then copied it to test3
and truncated this test3
file in an editor:
$ cp test2 test3
$ joe -hex test3
$ stat test3
Size: 4 Blocks: 65 IO Block: 32768 regular file
Well now, here's a file with 4 bytes in it - just like test
- but it's using 32.5K physically on the disk, because of the way the ZFS file system allocates space. Block sizes increase as the file gets larger, but they don't decrease when the file gets smaller. (And yes, this can lead to substantial wasted space depending on the kinds of files and file operations you do on ZFS, which is why it allows you to set the maximum block size on a per-filesystem basis, and change it dynamically.)
Hopefully, you can now appreciate that there isn't necessarily a simple relationship between file size and physical disk usage. Even in the above it's not clear why 32.5K bytes are needed to store a file that's exactly 32K in size - it appears that ZFS generally needs an extra 512 bytes for extra storage of its own. Perhaps it's using that storage for checksums, reference counts, transaction state - file system bookkeeping. By including these extras in the indicated physical file size, it seems like ZFS is trying not to mislead the user as to the physical costs of the file. That doesn't mean it's trivial to reverse-engineer the calculation without knowing intimate details about the underlying file system implementation.