Read large mongodb data

Your problem lies at the asList() call

This forces the driver to iterate through the entire cursor (80,000 docs few Gigs), keeping all in memory.

batchSize(someLimit) and Cursor.batch() won't help here as you traverse the whole cursor, no matter what batch size is.

Instead you can:

1) Iterate the cursor: List<MYClass> datalist = datasource.getCollection("mycollection").find()

2) Read documents one at a time and feed the documents into a buffer (let's say a list)

3) For every 1000 documents (say) call Hadoop API, clear the buffer, then start again.

Related