How to jump to a particular line in a huge text file?
You don't really have that many options if the lines are of different length... you sadly need to process the line ending characters to know when you've progressed to the next line.
You can, however, dramatically speed this up AND reduce memory usage by changing the last parameter to "open" to something not 0.
0 means the file reading operation is unbuffered, which is very slow and disk intensive. 1 means the file is line buffered, which would be an improvement. Anything above 1 (say 8 kB, i.e. 8192, or higher) reads chunks of the file into memory. You still access it through for line in open(etc):
, but python only goes a bit at a time, discarding each buffered chunk after its processed.
I'm probably spoiled by abundant ram, but 15 M is not huge. Reading into memory with readlines()
is what I usually do with files of this size. Accessing a line after that is trivial.
linecache:
The
linecache
module allows one to get any line from a Python source file, while attempting to optimize internally, using a cache, the common case where many lines are read from a single file. This is used by thetraceback
module to retrieve source lines for inclusion in the formatted traceback...
You can't jump ahead without reading in the file at least once, since you don't know where the line breaks are. You could do something like:
# Read in the file once and build a list of line offsets
line_offset = []
offset = 0
for line in file:
line_offset.append(offset)
offset += len(line)
file.seek(0)
# Now, to skip to line n (with the first line being line 0), just do
file.seek(line_offset[n])