Split: how to split into different percentages?
The commands below will work for percentages above 50% (if you want to split only into two files), quick and dirty approach.
1) split 70% based on lines
split -l $[ $(wc -l filename|cut -d" " -f1) * 70 / 100 ] filename
2) split 70% based on bytes
split -b $[ $(wc -c filename|cut -d" " -f1) * 70 / 100 ] filename
You could use csplit
to split into two pieces (using any percentage) e.g. first piece - first 20% of lines, second piece - the remaining 80% of lines:
csplit infile $(( $(wc -l < infile) * 2 / 10 + 1))
$(wc -l < infile)
: total number of lines
2 / 10
: percentage
+1
: add one line because csplit
splits up to but not including line N
You can only split based on lines though.
Basically, as long as you have the line number via $(( $(wc -l < file) * 2 / 10))
you can use any line-oriented tool:
sed 1,$(( $(wc -l < infile) * 2 / 10))'{
w 20-infile
d
}' infile > 80-infile
or, even cooler:
{ head -n$(( $(wc -l < infile) * 2 / 10)) > 20-infile; cat > 80-infile; } <infile
though some head
s are dumb and won't comply with the standards so this won't work on all setups...
{ BS=$(($(wc -c <file) * $P / 100))
dd count=1 bs="$BS" >file1; cat
} <file >file2 2>/dev/null
...should work for this simple case because you're only splitting once - and so probably split
is a little overkill. So long as the file is seekable, dd
will only do a single read()
on <stdin
, and so cat
is left to begin its read()
at whatever point dd
leaves it.
If the file is large then a count=1 bs=$big_ol_num
could get a little unwieldy, and it can be blocked out with some extra - yet simple - shell math.
A non-seekable input - like from a pipe - might skew dd
's results, though this can be handled as well w/ GNU dd
's iflag=fullblock
.