Randomly shuffle rows in a large text file
You can use the shuf
command from GNU coreutils. The utility is pretty fast and would take less than a minute for shuffling a 1 GB file.
The command below might just work in your case because shuf
will read the complete input before opening the output file:
$ shuf -o File.txt < File.txt
Python one-liner:
python -c 'import sys, random; L = sys.stdin.readlines(); random.shuffle(L); print "".join(L),'
Reads all the lines from the standard input, shuffles them in-place, then prints them without adding an ending newline (notice the ,
from the end).
For OSX the binary is called gshuf
.
brew install coreutils
gshuf -o File.txt < File.txt