Splitting a large txt file into 200 smaller txt files on a regex using shell script in BASH
awk '/[0-9]+ of [0-9]+ DOCUMENTS/{g++} { print $0 > g".txt"}' file
OSX users will need
gawk
, as the builtinawk
will produce an error likeawk: illegal statement at source line 1
Ruby(1.9+)
#!/usr/bin/env ruby
g=1
f=File.open(g.to_s + ".txt","w")
open("file").each do |line|
if line[/\d+ of \d+ DOCUMENTS/]
f.close
g+=1
f=File.open(g.to_s + ".txt","w")
end
f.print line
end
As suggested in other solutions, you could use csplit
for that:
csplit csplit.test '/^\.\.\./' '{*}' && sed -i '/^\.\.\./d' xx*
I haven't found a better way to get rid of the reminiscent separator in the split files.