Why is a text file taking up at least 4kB even when there's just one byte of text in it?
The block size of the file system must be 4 kB. When data is written to a file that is contained in a file system the operating system must allocate blocks of storage to contain the data that will be written to the file.
Typically, when a file system is created the storage contained in that file system is segmented into blocks of a fixed size. This Wikipedia article briefly explains this process.
The underlying block size of the file system for this file must have a 4K byte block size. This file is using 1 4K block and only one byte within that block contains actual data.
All file systems have a cluster or block size, or the smallest amount of disk space that can be allocated to hold a file. Even if the actual file size is smaller than the cluster/block size, it will still consume one cluster, or 4K on your file system. The cluster size depends on the file system, and the file system options.
If it contains zero bytes, as Gilles pointed out, it uses zero blocks/clusters but one inode on typical *nix file systems, which better answers the caveat, "unless it's blank."
A little experiment to help illustrate this:
First, let's see what the actual block size of my root ext4 (LVM) partition is:
[root@fedora17 blocksize]# dumpe2fs /dev/mapper/vg_fedora17-lv_root | grep -i "block size"
dumpe2fs 1.42.3 (14-May-2012)
Block size: 4096
It is 4096 (4 KiB), as expected. Now, let's create three files: The first is zero bytes, the second is just one byte, and the third is 4 KiB (the block size):
[root@fedora17 blocksize]# touch 0_bytes.bin
[root@fedora17 blocksize]# dd if=/dev/zero of=1_byte.bin bs=1 count=1
[root@fedora17 blocksize]# dd if=/dev/zero of=4096_bytes.bin bs=1 count=4096
Now, we ls
the directory. We use the -s
option to see the allocated size (the left-most column), in number of 1024-byte "blocks."
(ls doesn't know the real block size is 4096 -- we could specify --block-size
but that scales everything by that value, and we want to see the actual file size in bytes, too).
[root@fedora17 blocksize]# ls -ls
total 8
0 -rw-r--r--. 1 root root 0 Jan 21 23:56 0_bytes.bin
4 -rw-r--r--. 1 root root 1 Jan 21 23:38 1_byte.bin
4 -rw-r--r--. 1 root root 4096 Jan 21 23:38 4096_bytes.bin
Two things can be noted here:
- The zero byte file takes up zero blocks in the filesystem, confirming what Giles stated.
- Even though the other two files have different file sizes, they both take up 4*1024 = one 4KiB ext4 block.
Sparse Files
Sparse files are files with large blocks of zeros. Because the data is known to be all zero, there's no point in storing it on the disk. In this way, a file's apparent size can actually be larger than the on-disk size.
Inline Data
Note that some filesystems allow the contents very small files to be stored in the inode itself. See Is it possible to store data directly inside an inode on a Unix / Linux filesystem?.