How do I count and enumerate the keys in an lmdb with python?
As Sait pointed out, you can iterate over a cursor to collect all keys. However, this may be a bit inefficient, as it would also load the values. This can be avoided, by using on the cursor.iternext()
function with values=False
.
with env.begin() as txn:
keys = list(txn.cursor().iternext(values=False))
I did a short benchmark between both methods for a DB with 2^20 entries, each with a 16 B key and 1024 B value.
Retrieving keys by iterating over the cursor (including values) took 874 ms in average for 7 runs, while the second method, where only the keys are returned took 517 ms. These results may differ depending on the size of keys and values.
A way to get the total number of keys without enumerating them individually, counting also all sub databases:
with env.begin() as txn:
length = txn.stat()['entries']
Test result with a hand-made database of size 1000000 on my laptop:
- the method above is instantaneous (0.0 s)
- the iteration method takes about 1 second.
Are you looking for something like this
:
with env.begin() as txn:
with txn.cursor() as curs:
# do stuff
print 'key is:', curs.get('key')
Update:
This may not be the fastest:
with env.begin() as txn:
myList = [ key for key, _ in txn.cursor() ]
print(myList)
Disclaimer: I don't know anything about the library, just searched its docs and searched for key
in the docs.