Listing directories based on size from largest to smallest on single line
If you are confident that the directory names do not contain whitespace, then it is simple to get all the directory names on one line:
du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2" "}'
Getting the information into python
If you want to capture that output in a python program and make it into a list. Using python2.7 or better:
import subprocess
dir_list = subprocess.check_output("du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2\" \"}'", shell=True).split()
In python2.6:
import subprocess
subprocess.Popen("du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2\" \"}'", shell=True, stdout=subprocess.PIPE).communicate()[0].split()
We can also take advantage of python's features to reduce the amount of work done by the shell and, in particular, to eliminate the need for awk
:
subprocess.Popen("du -sk [a-z]*/ | sort -nr", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0].split()[1::2]
One could go further and read the du
output directly into python, convert the sizes to integers, and sort on size. It is simpler, though, just to do this with sort -nr
in the shell.
Specifying a directory
If the directories whose size you want are not in the current directory, there are two possibilities:
du -sk /some/path/[a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2" "}'
and also:
cd /some/path/ && du -sk [a-z]*/ 2>/dev/null | sort -nr | awk '{printf $2" "}'
The difference between these two is whether /some/path
is included in the output or not.
Using paste
du -sk [a-z]* 2>/dev/null | sort -nr| cut -f2- | paste -s -
zsh
has the ability to sort its globs using globbing qualifiers. You can also define your own glob qualifiers with functions. For instance:
zdu() REPLY=$(du -s -- "$REPLY")
print -r -- [[:alpha:]]*(/nO+zdu)
would print the directories (/
) whose name starts with a letter (btw, [a-z]
only makes sense in the C locale) numerically (n) reverse sorted (O) using the zdu
function.
Note that when you do:
du -s a b
If a
and b
contain hardlinks to the same files, their disk usage will be counted for a
but not for b
. The zsh
approach here avoids that.
If you're going to use python, I'd do the same from there: call du -s
for each of the files, and sort that list there. Remember that file names can contain any character including space, tab and newline.