What is the maximum value for the bs argument of dd?
The POSIX specifications for dd
don’t specify a maximum explicitly, but there are some limits:
- the datatype used to store the value given can be expected to be
size_t
, since that’s the type of the number of bytes to read given to theread
function; read
is also specified to have a limit ofSSIZE_MAX
;- under Linux,
read
only transfers up to 2,147,479,552 bytes anyway.
On a 64-bit platform, size_t
is 64 bits in length; in addition, it’s unsigned, so dd
will fail when given values greater than 264 – 1:
$ dd if=/dev/zero of=/dev/null bs=18446744073709551616
dd: invalid number: ‘18446744073709551616’
On Linux on 64-bit x86, SSIZE_MAX
is 0x7fffffffffffffffL (run echo SSIZE_MAX | gcc -include limits.h -E -
to check), and that’s the input limit:
$ dd if=/dev/zero of=/dev/null bs=9223372036854775808
dd: invalid number: ‘9223372036854775808’: Value too large for defined data type
$ dd if=/dev/zero of=/dev/null bs=9223372036854775807
dd: memory exhausted by input buffer of size 9223372036854775807 bytes (8.0 EiB)
Once you find a value which is accepted, the next limit is the amount of memory which can be allocated, since dd
needs to allocate a buffer before it can read into it.
Once you find a value which can be allocated, you’ll hit the read
limit (on Linux and other systems with similar limits), unless you use GNU dd
and specify iflag=fullblock
:
$ dd if=/dev/zero of=ddtest bs=4294967296 count=1
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 38.3037 s, 56.1 MB/s
(dd
copied just under 231 bytes, i.e. the Linux limit mentioned above, not even half of what I asked for).
As explained in the Q&A linked above, you’ll need fullblock
to reliably copy all the input data in any case, for any value of bs
greater than 1.
Regardless of its maximum value, you have a bigger problem there; from the POSIX spec:
The
dd
utility shall copy the specified input file to the specified output file with possible conversions using specific input and output block sizes. It shall read the input one block at a time, using the specified input block size; it shall then process the block of data actually returned, which could be smaller than the requested block size.
(emphasis added)
As I wrote in the past, dd
is an extremely stupid tool: in your case, it essentially boils down to
char *buf = malloc(bs);
for(int i = 0; i < count; ++i) {
int len = read(STDIN_FILENO, buf, bs);
if(len == 0) break;
write(STDOUT_FILENO, buf, len);
}
free(buf);
bs
is just the argument dd
uses to perform the read(2)
syscall, but read(2)
is allowed to perform a "short read", i.e. to return less bytes than requested. Indeed, it's what it does if it has some bytes available right now, even if they aren't all you asked for; this is typical if the input file is a tty, a pipe or a socket (so you are particularly at risk with your CGI...). Just try:
$ dd bs=1000 count=1
asd
asd
0+1 records in
0+1 records out
4 bytes copied, 1.75356 s, 0.0 kB/s
Here I typed in asd
and pressed enter; dd
read it (performing a single read(STDIN_FILENO, buf, 1000)
and wrote it out; it did one read
as requested, so it exits. It doesn't look like it copied 1000 bytes.
Ultimately, plain "standard" dd
is a way too stupid tool for most needs; you can wrangle it to do what you need by either:
- by using
bs=1
and usingcount
for the number of bytes; this is guaranteed to copy the number of bytes you need (if available before EOF), but it's quite inefficient, as it performs one syscall per byte; - add the
fullblock
flag; this makes sure thatdd
accumulates a full input block before writing it out. Notice however that this is nonstandard (GNU dd has it, IDK about others).
Ultimately, if you are going for non-POSIX extension, my suggestion is to just use head -c
: it will do the Right Thing with sensible buffering and no particular size limits, ensuring correctness and good performance.
The maximum depends on the system (including its allocation policies) and the currently available memory.
Instead of trying to read everything at once (you could exhaust memory, slow things down because of swap, you would have to add checks to see if it actually worked ...) you could read reasonable sized blocks with dd
.
Let's say you want to read those bytes and put them into a file. In bash you could run something like this (the total bytes are in $total):
block=65535
count=$(expr $total / $block)
rest=$(expr $total % $block)
(dd bs=$block count=$count;dd bs=$rest count=1) > filename