How can I get the size of stdin?
tl;dr: tar -cv dir | wc -c - | cut -d' ' -f 1 | awk '{print $1/1000"K"}'
du
doesn't actually count the size of the file itself. It just asks the kernel to query the filesystem, which already keeps track of file size. This is why it's so fast. Because of that, and the fact that you're counting a stream, not a file, du
doesn't work. My guess is that 1.0K
is a hardcoded size for /dev/std*
in the kernel.
The solution is to use wc -c
, which counts bytes itself instead of querying the kernel:
$ tar -cv dir | wc -c
If you want output similar to du -h
:
$ tar -cv dir | wc -c | awk '{print $1/1000"K"}'
The awk
turns the number into a human-readable result.
With GNU tar
you can just do:
tar --totals -c . >/dev/null
...which will render output like...
Total bytes written: 5990400 (5.8MiB, 5.5GiB/s)
...on stderr. Similarly, with any tar (or stream) you can use dd
to deliver a report on byte counts. This may or may not be preferable to wc
, but dd
defaults to a block-size of 512 bytes - which is identical to tar
's block-size. If your system's PIPE_BUF is large enough, you can even expand dd
's block-size to match tar
's record size - which is 20 blocks, or 10240 bytes. Like this:
tar -c . | dd bs=bx20 >/dev/null
585+0 records in
585+0 records out
5990400 bytes (6.0 MB) copied, 0.0085661 s, 699 MB/s
This may or may not offer a more performant solution than wc
.
In both the dd
and tar
use-cases you needn't actually dispose of the stream, though. I redirect to /dev/null
above - but I could have as easily redirected to some file and still received the report on its size at the time it was written.
I'd suggest:
tar cf - dir | wc -c
A simple c
(no leading -
is required) is used to create a tar
archive, f
specifies an output file and -
denotes that it be stdout. (Note that if you want just the size and there are many files beneath dir you may rather omit tar
's v
for performance reasons.)