split file into N pieces with same name but different target directories
#!/bin/bash
# assuming the file is in the same folder as the script
INPUT=large_file.txt
# assuming the folder called "output" is in the same folder
# as the script and there are folders that have the patter
# prog01 prog02 ... prog30
# create that with mkdir output/prog{01..30}
OUTPUT_FOLDER=output
OUTPUT_FILE_FORMAT=myfile
# split
# -n -> 30 files
# $OUTPUT_FILE_FORMAT -> should start with this pattern
# --numeric-suffixes=1 -> end of file name should start from 01
split -n 30 $INPUT $OUTPUT_FILE_FORMAT --numeric-suffixes=1
# move all files to their repective directories
for i in {01..30}
do
mv $OUTPUT_FILE_FORMAT$i $OUTPUT_FOLDER/prog$i/myfile.txt
done
echo "done :)"
exit
The split command is more than enough for this task. However the solution here requires you to make your folder names start from prog01
and not prog1
The awk
only solution (N here equals 30 files):
awk 'BEGIN{ cmd="wc -l <sourcefile.txt"; cmd|getline l; l=int((l+29)/30); close(cmd) }
NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' sourcefile.txt
Or let shell run and return the number of lines in sourcefile.txt and pass to awk
as suggested by jthill.
awk 'NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}'
l=$(( ($(wc -l <sourcefile.txt)+29)/30 )) sourcefile.txt
split
+ bash
solution:
lines=$(echo "t=$(wc -l ./sourcefile.txt | cut -d' ' -f1); d=30; if(t%d) t/d+1 else t/d" | bc)
split -l $lines ./sourcefile.txt "myfile.txt" --numeric-suffixes=1
for f in myfile.txt[0-9]*; do
dir_n="prog"$(printf "%d" "${f#*txt}") # constructing directory name
mv "$f" "$dir_n/myfile.txt"
done
Assuming that you already have folders called prog1 to prog30 (as you mentioned)
lines
- contains the integer number of lines per output filet
- total number of lines of file./sourcefile.txt
d=30
is a divider
--numeric-suffixes=1
- split's option, tells to use numeric suffixes starting at1