Bulk insert or update with Hibernate?
High-throughput data export
If you only want to import data without doing any processing or transformation, then a tool like PostgreSQL COPY
is the fastest way o import data.
Batch processing
However, if you need to do the transformation, data aggregation, correlation/merging between existing data and the incoming one, then you need application-level batch processing.
In this case, you want to flush-clear-commit
regularly:
int entityCount = 50;
int batchSize = 25;
EntityManager entityManager = entityManagerFactory()
.createEntityManager();
EntityTransaction entityTransaction = entityManager
.getTransaction();
try {
entityTransaction.begin();
for (int i = 0; i < entityCount; i++) {
if (i > 0 && i % batchSize == 0) {
entityTransaction.commit();
entityTransaction.begin();
entityManager.clear();
}
Post post = new Post(
String.format("Post %d", i + 1)
);
entityManager.persist(post);
}
entityTransaction.commit();
} catch (RuntimeException e) {
if (entityTransaction.isActive()) {
entityTransaction.rollback();
}
throw e;
} finally {
entityManager.close();
}
Also, make sure you enable JDBC batching as well using the following configuration properties:
<property
name="hibernate.jdbc.batch_size"
value="25"
/>
<property
name="hibernate.order_inserts"
value="true"
/>
<property
name="hibernate.order_updates"
value="true"
/>
Bulk processing
Bulk processing
is suitable when all rows match pre-defined filtering criteria, so you can use a single UPDATE to change all records.
However, using bulk updates that modify millions of records can increase the size of the redo log or end up taking lots of locks on database systems that still use 2PL (Two-Phase Locking)
, like SQL Server.
So, while the bulk update is the most efficient way to change many records, you have to pay attention to how many records are to be changed to avoid a long-running transaction.
Also, you can combine bulk update with optimistic locking so that other OLTP transactions won't lose the update done by the bulk processing process.
There are many possible bottlenecks in to bulk operations. The best approach depends heavily on what your data looks like. Have a look at the Hibernate Manual section on batch processing.
At a minimum, make sure you are using the following pattern (copied from the manual):
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
If you are mapping a flat file to a very complex object graph you may have to get more creative, but the basic principal is that you have to find a balance between pushing good sized chunks of data to the database with each flush/commit and avoiding exploding the size of the session level cache.
Lastly, if you don't need Hibernate to handle any collections or cascading for your data to be correctly inserted, consider using a StatelessSession.
From Hibernate Batch Processing For update i used the following :
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
ScrollableResults employeeCursor = session.createQuery("FROM EMPLOYEE")
.scroll();
int count = 0;
while ( employeeCursor.next() ) {
Employee employee = (Employee) employeeCursor.get(0);
employee.updateEmployee();
seession.update(employee);
if ( ++count % 50 == 0 ) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
But for insert i would go for jcwayne answer
According to an answer to a similar question, it can be done by configuring Hibernate to insert objects using a custom stored procedure which uses your database's upsert functionality. It's not pretty, though.