How to slice a generator object or iterator?
islice is the pythonic way
from itertools import islice
g = (i for i in range(100))
for num in islice(g, 95, None):
print num
You can't slice a generator object or iterator using a normal slice operations.
Instead you need to use itertools.islice
as @jonrsharpe already mentioned in his comment.
import itertools
for i in itertools.islice(x, 95)
print(i)
Also note that islice
returns an iterator and consume data on the iterator or generator. So you will need to convert you data to list or create a new generator object if you need to go back and do something or use the little known itertools.tee
to create a copy of your generator.
from itertools import tee
first, second = tee(f())
In general, the answer is itertools.islice
, but you should note that islice
doesn't, and can't, actually skip values. It just grabs and throws away start
values before it starts yield
-ing values. So it's usually best to avoid islice
if possible when you need to skip a lot of values and/or the values being skipped are expensive to acquire/compute. If you can find a way to not generate the values in the first place, do so. In your (obviously contrived) example, you'd just adjust the start index for the range
object.
In the specific cases of trying to run on a file object, pulling a huge number of lines (particularly reading from a slow medium) may not be ideal. Assuming you don't need specific lines, one trick you can use to avoid actually reading huge blocks of the file, while still testing some distance in to the file, is the seek
to a guessed offset, read out to the end of the line (to discard the partial line you probably seeked to the middle of), then islice
off however many lines you want from that point. For example:
import itertools
with open('myhugefile') as f:
# Assuming roughly 80 characters per line, this seeks to somewhere roughly
# around the 100,000th line without reading in the data preceding it
f.seek(80 * 100000)
next(f) # Throw away the partial line you probably landed in the middle of
for line in itertools.islice(f, 100): # Process 100 lines
# Do stuff with each line
For the specific case of files, you might also want to look at mmap
which can be used in similar ways (and is unusually useful if you're processing blocks of data rather than lines of text, possibly randomly jumping around as you go).
Update: From your updated question, you'll need to look at your API docs and/or data format to figure out exactly how to skip around properly. It looks like skbio
offers some features for skipping using seq_num
, but that's still going to read if not process most of the file. If the data was written out with equal sequence lengths, I'd look at the docs on Alignment
; aligned data may be loadable without processing the preceding data at all, by e.g by using Alignment.subalignment
to create new Alignment
s that skip the rest of the data for you.