How to cache or otherwise speed up `du` summaries?

What you are seeing when you rerun a du command is the effect of disk buffering. Once you read a block its disk buffer is kept in the buffer cache until that block is needed. For du you need to read the directory and the inode for each file in the directory. The du results are not cached in this case, but can be derived with far less disk IO.

While it would be possible to force the system to cache this information, overall performance would suffer as the required buffer space would not be available for actively accessed files.

The directory itself has no idea how large a file is, so each file's inode needs to be accessed. To keep the cached value up to date every time a file changed size the cached value would need to be updated. As a file can be listed in 0 or more directories this would require each file's inode to know which directories it is listed in. This would greatly complicate the inode structure and reduce IO performance. Also as du allows you to get results assuming different block sizes, the data required in the cache would need to increment or decrement the cached value for each possible block size further slowing performance.

Common usage of du can be immensely sped up by using ncdu.

ncdu - NCurses Disk Usage

performs the du, caches the results and shows them in a nice command line gui, somewhat comparable to du -hc -d 1 | sort -h. The initial indexing takes equally long as du, but looking for the actual "culprit" that fills up precious space is sped up, as all subdirectories have the initially cached du information available.

If needed subdirectories can be refreshed by pressing [r] and files/folders can be deleted by pressing [d], both of which update stats for all parent directories. Deletion asks for confirmation.

If nececcary, further speedup can be achieved by precaching ncdu -1xo- / | gzip >export.gz in a cronjob and later accessing it with zcat export.gz | ncdu -f-, but obviously gives more outdated information.

I prefer to use the agedu

Agedu is a piece of software which attempts to find old and irregularly used files on the presumption that these files are most likely not to be wanted. (e.g. Downloads which have only been viewed once.)

It does basically the same sort of disk scan as du, but it also records the last-access times of everything it scans. Then it builds an index that lets it efficiently generate reports giving a summary of the results for each subdirectory, and then it produces those reports on demand.

How to cache or otherwise speed up `du` summaries?

Tags:

Filesystems

Disk Usage

Cache

Related

Recent Posts