Handling very large amount of data in MyBatis

I have successfully used MyBatis streaming with the Cursor. The Cursor has been implemented on MyBatis at this PR.

From the documentation it is described as

A Cursor offers the same results as a List, except it fetches data lazily using an Iterator.

Besides, the code documentation says

Cursors are a perfect fit to handle millions of items queries that would not normally fits in memory.

Here is an example of implementation I have done and which I was able to successfully use it:

import org.mybatis.spring.SqlSessionFactoryBean;

// You have your SqlSessionFactory somehow, if using Spring you can use 
SqlSessionFactoryBean sqlSessionFactory = new SqlSessionFactoryBean();

Then you define your mapper, e.g., UserMapper with the SQL query that returns a Cursor of your target object, not a List. The whole idea is to not store all the elements in memory:

import org.apache.ibatis.annotations.Select;
import org.apache.ibatis.cursor.Cursor;

public interface UserMapper {

    @Select(
        "SELECT * FROM users"
    )
    Cursor<User> getAll();
}

Then you write the that code that will use an open SQL session from the factory and query using your mapper:

try(SqlSession sqlSession = sqlSessionFactory.openSession()) {
    Iterator<User> iterator = sqlSession.getMapper(UserMapper.class)
                                        .getAll()
                                        .iterator();
    while (iterator.hasNext()) {
        doSomethingWithUser(iterator.next());
    }
}

handleResult receives as many records as the query gets, no pause.

When there are too many records to process I used sqlSessionFactory.getSession().getConnection(). Then as, normal JDBC, get a Statement, get the Resultset, and process one by one the records. Don't forget to close the session.


myBatis CAN stream results. What you need is a custom result handler. With this you can take each row separately and write it to your XML file. The overall scheme looks like this:

session.select(
    "mappedStatementThatFindsYourObjects",
    parametersForStatement,
    resultHandler);

Where resultHandler is an instance of a class implementing the ResultHandler interface. This interface has just one method handleResult. This method provides you with a ResultContext object. From this context you can retrieve the row currently being read and do something with it.

handleResult(ResultContext context) {
  Object result = context.getResultObject();
  doSomething(result);
}

No, mybatis does not have full capability to stream results yet.

EDIT 1: If you don't need nested result mappings then you could implement a custom result handler to stream results. on current released versions of MyBatis. (3.1.1) The current limitation is when you need to do complex result mapping. The NestedResultSetHandler does not allow custom result handlers. A fix is available, and it looks like is currently targeted for 3.2. See Issue 577.

In summary, to stream large result sets using MyBatis you'll need.

  1. Implement your own ResultSetHandler.
  2. Increase fetch size. (as noted below by Guillaume Perrot)
  3. For Nested result maps, use the fix discussed on Issue 577. This fix also resolves some memory issues with large result sets.