Split: how to split into different percentages?

The commands below will work for percentages above 50% (if you want to split only into two files), quick and dirty approach.

1) split 70% based on lines

split -l $[ $(wc -l filename|cut -d" " -f1) * 70 / 100 ] filename 

2) split 70% based on bytes

split -b $[ $(wc -c filename|cut -d" " -f1) * 70 / 100 ] filename

You could use csplit to split into two pieces (using any percentage) e.g. first piece - first 20% of lines, second piece - the remaining 80% of lines:

csplit infile $(( $(wc -l < infile) * 2 / 10 + 1))

$(wc -l < infile) : total number of lines
2 / 10 : percentage
+1 : add one line because csplit splits up to but not including line N

You can only split based on lines though.
Basically, as long as you have the line number via $(( $(wc -l < file) * 2 / 10)) you can use any line-oriented tool:

sed 1,$(( $(wc -l < infile) * 2 / 10))'{
w 20-infile
d
}' infile > 80-infile

or, even cooler:

{ head -n$(( $(wc -l < infile) * 2 / 10)) > 20-infile; cat > 80-infile; } <infile

though some heads are dumb and won't comply with the standards so this won't work on all setups...


{   BS=$(($(wc -c <file) * $P / 100))
    dd count=1 bs="$BS" >file1; cat
} <file >file2 2>/dev/null

...should work for this simple case because you're only splitting once - and so probably split is a little overkill. So long as the file is seekable, dd will only do a single read() on <stdin, and so cat is left to begin its read() at whatever point dd leaves it.

If the file is large then a count=1 bs=$big_ol_num could get a little unwieldy, and it can be blocked out with some extra - yet simple - shell math.

A non-seekable input - like from a pipe - might skew dd's results, though this can be handled as well w/ GNU dd's iflag=fullblock.

Tags:

Split