How to insert data as fast as possible with Hibernate
First of all you should apply a fork-join approach here.
The main task parses the file and sends batches of at most 100 items to an ExecutorService. The ExecutorService
should have a number of worker threads that equals the number of available database connections. If you have 4 CPU cores, let's say that the database can take 8 concurrent connections without doing to much context switching.
You should then configure a connection pooling DataSource
and have a minSize equal to maxSize and equal to 8. Try HikariCP or ViburDBCP for connection pooling.
Then you need to configure JDBC batching. If you're using MySQL, the IDENTITY generator will disable bathing. If you're using a database that supports sequences, make sure you also use the enhanced identifier generators (they are the default option in Hibernate 5.x).
This way the entity insert process is parallelized and decoupled from the main parsing thread. The main thread should wait for the ExecutorService
to finish processing all tasks prior to shutting down.
Actually it is hard to suggest to you without doing real profiling and find out what's making your code slow or inefficient.
However there are several things we can see from your code
You are using StringBuilder inefficiently
wholeDocument.append("\n" + line);
should be wrote aswholeDocument.append("\n").append(line);
insteadBecause what you original wrote will be translated by compiler to
whileDocument.append(new StringBuilder("\n").append(line).toString())
. You can see how much unnecessaryStringBuilder
s you have created :)Consideration in using Hibernate
I am not sure how you manage your
session
or how you implemented yourcommit()
, I assume you have done it right, there are still more thing to consider:Have you properly set up batch size in Hibernate? (
hibernate.jdbc.batch_size
) By default, the JDBC batch size is something around 5. You may want to make sure you set it in bigger size (so that internally Hibernate will send inserts in a bigger batch).Given that you do not need the entities in 1st level cache for later use, you may want to do intermittent session
flush()
+clear()
to- Trigger batch inserts mentioned in previous point
- clear out first level cache
Switch away from Hibernate for this feature.
Hibernate is cool but it is not panacea for everything. Given that in this feature you are just saving records into DB based on text file content. Neither you do need any entity behavior, nor you need to make use of first level cache for later processing, there is not much reason to make use of Hibernate here given the extra processing and space overhead. Simply doing JDBC with manual batch handling is going to save you a lot of trouble .