Splitting a large txt file into 200 smaller txt files on a regex using shell script in BASH

awk '/[0-9]+ of [0-9]+ DOCUMENTS/{g++} { print $0 > g".txt"}' file

OSX users will need gawk, as the builtin awk will produce an error like awk: illegal statement at source line 1

Ruby(1.9+)

#!/usr/bin/env ruby
g=1
f=File.open(g.to_s + ".txt","w")
open("file").each do |line|
  if line[/\d+ of \d+ DOCUMENTS/]
    f.close
    g+=1
    f=File.open(g.to_s + ".txt","w")
  end
  f.print line
end

As suggested in other solutions, you could use csplit for that:

csplit csplit.test '/^\.\.\./' '{*}' && sed -i '/^\.\.\./d' xx*

I haven't found a better way to get rid of the reminiscent separator in the split files.

Splitting a large txt file into 200 smaller txt files on a regex using shell script in BASH

Tags:

Unix

Scripting

Shell

Regex

Bash

Related

Recent Posts