How to read lines from a mmapped file?
The most concise way to iterate over the lines of an mmap
is
with open(STAT_FILE, "r+b") as f:
map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
for line in iter(map_file.readline, b""):
# whatever
Note that in Python 3 the sentinel parameter of iter()
must be of type bytes
, while in Python 2 it needs to be a str
(i.e. ""
instead of b""
).
I modified your example like this:
with open(STAT_FILE, "r+b") as f:
m=mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
while True:
line=m.readline()
if line == '': break
print line.rstrip()
Suggestions:
- Do not call a variable
map
, this is a built-in function. - Open the file in
r+b
mode, as in the Python example on themmap
help page. It states: In either case you must provide a file descriptor for a file opened for update. See http://docs.python.org/library/mmap.html#mmap.mmap. - It's better to not use
UPPER_CASE_WITH_UNDERSCORES
global variable names, as mentioned in Global Variable Names at https://www.python.org/dev/peps/pep-0008/#global-variable-names. In other programming languages (like C), constants are often written all uppercase.
Hope this helps.
Edit: I did some timing tests on Linux because the comment made me curious. Here is a comparison of timings made on 5 sequential runs on a 137MB text file.
Normal file access:
real 2.410 2.414 2.428 2.478 2.490
sys 0.052 0.052 0.064 0.080 0.152
user 2.232 2.276 2.292 2.304 2.320
mmap
file access:
real 1.885 1.899 1.925 1.940 1.954
sys 0.088 0.108 0.108 0.116 0.120
user 1.696 1.732 1.736 1.744 1.752
Those timings do not include the print
statement (I excluded it). Following these numbers I'd say memory mapped file access is quite a bit faster.
Edit 2: Using python -m cProfile test.py
I got the following results:
5432833 2.273 0.000 2.273 0.000 {method 'readline' of 'file' objects}
5432833 1.451 0.000 1.451 0.000 {method 'readline' of 'mmap.mmap' objects}
If I'm not mistaken then mmap
is quite a bit faster.
Additionally, it seems not len(line)
performs worse than line == ''
, at least that's how I interpret the profiler output.