python generator: unpack entire generator in parallel
No. You must call next()
sequentially because any non-trivial generator's next state is determined by its current state.
def gen(num):
j=0
for i in xrange(num):
j += i
yield j
There's no way to parallelize calls to the above generator without knowing its state at each point it yields a value. But if you knew that, you wouldn't need to run it.
Assuming the calls to block_parser(b)
to be performed in parallel, you could try using a multiprocessing.Pool:
import multiprocessing as mp
pool = mp.Pool()
raw_blocks = block_generator(fin)
parsed_blocks = pool.imap(block_parser, raw_blocks)
data = parsedBlocksToOrderedDict(parsed_blocks)
Note that:
- If you expect that
list(parsed_blocks)
can fit entirely in memory, then usingpool.map
can be much faster thanpool.imap
. - The items in
raw_blocks
and the return values fromblock_parse
must be pickable sincemp.Pool
transfers tasks and results through amp.Queue
.