Read the middle of a large file

This is slow because of the small block size. Using a recent GNU dd (coreutils v8.16 +), the simplest way is to use the skip_bytes and count_bytes options:

in_file=1tb

start=12345678901
end=19876543212
block_size=4096

copy_size=$(( $end - $start ))

dd if="$in_file" iflag=skip_bytes,count_bytes,fullblock bs="$block_size" \
  skip="$start" count="$copy_size"

Update

fullblock option added above as per @Gilles answer. At first I thought that it might be implied by count_bytes, but this is not the case.

The issues mentioned are a potential problem below, if dds read/write calls are interrupted for any reason then data will be lost. This is not likely in most cases (odds are reduced somewhat since we are reading from a file and not a pipe).


Using a dd without the skip_bytes and count_bytes options is more difficult:

in_file=1tb

start=12345678901
end=19876543212
block_size=4096

copy_full_size=$(( $end - $start ))
copy1_size=$(( $block_size - ($start % $block_size) ))
copy2_start=$(( $start + $copy1_size ))
copy2_skip=$(( $copy2_start / $block_size ))
copy2_blocks=$(( ($end - $copy2_start) / $block_size ))
copy3_start=$(( ($copy2_skip + $copy2_blocks) * $block_size ))
copy3_size=$(( $end - $copy3_start ))

{
  dd if="$in_file" bs=1 skip="$start" count="$copy1_size"
  dd if="$in_file" bs="$block_size" skip="$copy2_skip" count="$copy2_blocks"
  dd if="$in_file" bs=1 skip="$copy3_start" count="$copy3_size"
}

You could also experiment with different block sizes, but the gains won't be very dramatic. See - Is there a way to determine the optimal value for the bs parameter to dd?


bs=1 tells dd to read and write one byte at a time. There is an overhead for each read and write call, which makes this slow. Use a larger block size for decent performance.

When you copy a whole file, at least under Linux, I've found that cp and cat are faster than dd, even if you specify a large block size.

To copy only part of a file, you can pipe tail into head. This requires GNU coreutils or some other implementation that has head -c to copy a specified number of bytes (tail -c is in POSIX but head -c isn't). A quick benchmark on Linux shows this to be slower than dd, presumably because of the pipe.

tail -c $((2345678901+1)) | head -c $((19876543212-2345678901))

The problem with dd is that it is not reliable: it can copy partial data. As far as I know, dd is safe when reading and writing to a regular file — see When is dd suitable for copying data? (or, when are read() and write() partial) — but only as long as it isn't interrupted by a signal. With GNU coreutils, you can use the fullblock flag, but this is not portable.

Another problem with dd is that it can be hard to find a block count that works, because both the number of skipped bytes and the number of transferred bytes count need to be a multiple of the block size. You can use multiple calls to dd: one to copy the first partial block, one to copy the bulk of aligned blocks and one to copy the last partial block — see Graeme's answer for a shell snippet. But don't forget that you when you run the script, unless you're using the fullblock flag, you need to pray that dd will copy all the data. dd returns a nonzero status if a copy is partial, so it's easy to detect the error, but there's no practical way to repair it.

POSIX has nothing better to offer at the shell level. My advice would be to write a small special-purpose C program (depending on exactly what you implement, you can call it dd_done_right or tail_head or mini-busybox).


With dd:

dd if=1tb skip=12345678901 count=$((19876543212-12345678901)) bs=1M iflags=skip_bytes,count_bytes

Alternatively with losetup:

losetup --find --show --offset 12345678901 --sizelimit $((19876543212-12345678901))

And then dd, cat, ... the loop device.

Tags:

Dd

Files