What is that "total" in the very first line after ls -l?

You can find the definition of that line in the ls documentation for your platform. For coreutils ls (the one found on a lot of Linux systems), the information can be found via info coreutils ls:

For each directory that is listed, preface the files with a line `total BLOCKS', where BLOCKS is the total disk allocation for all files in that directory.


The Formula: What is that number?

total int = Sum of (physical_blocks_in_use) * physical_block_size/ls_block_size) for each file.

Where:

  • ls_block_size is an arbitrary environment variable (normally 512 or 1024 bytes) which is freely modifiable with the --block-size=<int> flag on ls, the POSIXLY_CORRECT=1 GNU environment variable (to get 512-byte units), or the -k flag to force 1kB units.
  • physical_block_size is the OS dependent value of an internal block interface, which may or may not be connected to the underlying hardware. This value is normally 512b or 1k, but is completely dependent on OS. It can be revealed through the %B value on stat or fstat. Note that this value is (almost always) unrelated to the number of physical blocks on a modern storage device.

Why so confusing?

This number is fairly detached from any physical or meaningful metric. Many junior programmers haven't had experience with file holes or hard/sym links. In addition, the documentation available on this specific topic is virtually non-existent.

The disjointedness and ambiguity of the term "block size" has been a result of numerous different measures being easily confused, and the relatively deep levels of abstraction revolving around disk access.

Examples of conflicting information: du (or ls -s) vs stat

Running du * in a project folder yields the following: (Note: ls -s returns the same results.)

dactyl:~/p% du *
2       check.cc
2       check.h
1       DONE
3       Makefile
3       memory.cc
5       memory.h
26      p2
4       p2.cc
2       stack.cc
14      stack.h

Total: 2+2+1+3+3+5+26+4+2+14 = 62 Blocks

Yet when one runs stat we see a different set of values. Running stat in the same directory yields:

dactyl:~/p% stat * --printf="%b\t(%B)\t%n: %s bytes\n"
3       (512)   check.cc: 221 bytes
3       (512)   check.h: 221 bytes
1       (512)   DONE: 0 bytes
5       (512)   Makefile: 980 bytes
6       (512)   memory.cc: 2069 bytes
10      (512)   memory.h: 4219 bytes
51      (512)   p2: 24884 bytes
8       (512)   p2.cc: 2586 bytes
3       (512)   stack.cc: 334 bytes
28      (512)   stack.h: 13028 bytes

Total: 3+3+1+5+6+10+51+8+3+28 = 118 Blocks

Note: You can use the command stat * --printf="%b\t(%B)\t%n: %s bytes\n" > to output (in order) the number of blocks, (in parens) the size of those blocks, the name of the file, and the size in bytes, as shown above.

There are two important things takeaways:

  • stat reports both the physical_blocks_in_use and physical_block_size as used in the formula above. Note that these are values based on OS interfaces.
  • du is providing what is generally accepted as a fairly accurate estimate of physical disk utilization.

For reference, here is the ls -l of directory above:

dactyl:~/p% ls -l
**total 59**
-rw-r--r--. 1 dhs217 grad   221 Oct 16  2013 check.cc
-rw-r--r--. 1 dhs217 grad   221 Oct 16  2013 check.h
-rw-r--r--. 1 dhs217 grad     0 Oct 16  2013 DONE
-rw-r--r--. 1 dhs217 grad   980 Oct 16  2013 Makefile
-rw-r--r--. 1 dhs217 grad  2069 Oct 16  2013 memory.cc
-rw-r--r--. 1 dhs217 grad  4219 Oct 16  2013 memory.h
-rwxr-xr-x. 1 dhs217 grad 24884 Oct 18  2013 p2
-rw-r--r--. 1 dhs217 grad  2586 Oct 16  2013 p2.cc
-rw-r--r--. 1 dhs217 grad   334 Oct 16  2013 stack.cc
-rw-r--r--. 1 dhs217 grad 13028 Oct 16  2013 stack.h

That is the total number of file system blocks, including indirect blocks, used by the listed files. If you run ls -s on the same files and sum the reported numbers you'll get that same number.