Understanding sparse files, dd, seek, inode block structure
Some quick answers: first, you didn't create a sparse file. Try these extra commands
dd if=/tmp/BIL of=/tmp/sparse seek=1000
ls -ls /tmp/sparse
You will see the size is 512003 bytes, but only takes 8 blocks. The null bytes have to occupy a whole block, and be on a block boundary for them to be possibly sparse in the filesystem.
Why does the second occurrence of "BIL" appear out of order?
because you are on a little-endian system and you are writing output in shorts. Use bytes, like cat does.
How does cat and other tools know to print in the correct order?
they work on bytes.
How do programs like ls discern between the "alleged" size and the allocated size?
ls
and so on use thestat(2)
system call which returns 2 values:st_size; /* total size, in bytes */ blkcnt_t st_blocks; /* number of 512B blocks allocated */
What tools can I use to interrogate inode information?
stat is good.
Is there a tool where I can walk the direct and indirect blocks?
On ext2/3/4 you can use
hdparm --fibmap
with the filename:$ sudo hdparm --fibmap ~/sparse filesystem blocksize 4096, begins at LBA 25167872; assuming 512 byte sectors. byte_offset begin_LBA end_LBA sectors 512000 226080744 226080751 8
You can also use
debugfs
:$ sudo debugfs /dev/sda3 debugfs: stat <1040667> Inode: 1040667 Type: regular Mode: 0644 Flags: 0x0 Generation: 1161905167 Version: 0x00000000 User: 127 Group: 500 Size: 335360 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 664 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4dd61e6c -- Fri May 20 09:55:24 2011 atime: 0x4dd61e29 -- Fri May 20 09:54:17 2011 mtime: 0x4dd61e6c -- Fri May 20 09:55:24 2011 Size of extra inode fields: 4 BLOCKS: (0-11):4182714-4182725, (IND):4182726, (12-81):4182727-4182796 TOTAL: 83
Why does dd truncate my file and can dd or another tool write into the middle of a file?
Yes,
dd
can write into the middle. Addconv=notrunc
.Are there mechanisms to prevent sparse files be shrunk/grown? And if not, why are sparse files useful?
No. Because they take less space.
The sparse aspect of a file should be totally transparent to a program, which sometimes means the sparseness may be lost when the program updates a file.
Some copying utilities have options to preserve sparseness, eg tar --sparse
, rsync --sparse
.
Note, you can explicitly convert the suitably aligned zero blocks in a file to sparseness by using cp --sparse=always
and the reverse, converting sparse space into real zeros, with cp --sparse=never
.
A better tool for dumping file layout on Linux is the filefrag
utility included in the e2fsprogs
package. This will dump all of the extents in a file in an efficient and compact manner:
$ dd of=/var/tmp/sparse if=/dev/zero count=1
$ dd of=/var/tmp/sparse if=/dev/zero seek=1000 count=1
$ filefrag -v /var/tmp/sparse
Filesystem type is: ef53
File size of /var/tmp/sparse is 512512 (126 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 3441408.. 3441408: 1:
1: 125.. 125: 3441533.. 3441533: 1: 3441409: last,eof
/var/tmp/sparse: 2 extents found
The FIEMAP ioctl is available for most common Linux filesystems (ext4, XFS, Btrfs, etc), but not yet for ZFS (though that is under development).