hibernate3 batch operations

Usually, only a limited number of persistent objects are stored in the cache of a Session object. After the Session object has finished processing the transaction, the Session object must be closed to release the memory occupied by the Session cache in time.
Batching data refers to processing large amounts of data in one transaction. The following program batch updates the AGE field of all records in the CUSTOMERS table whose age is greater than zero in one transaction:
Transaction tx = session.beginTransaction();
Iterator customers=
session.createQuery("from Customer c where c.age>0"). list().iterator();
while(customers.hasNext()){
Customer customer=(Customer)customers.next();
customer.setAge(customer.getAge()+1);
}

tx.commit();
session.close();

If there are 10,000 records whose age is greater than zero in the CUSTOMERS table, hibernate will load 10,000 Customer objects into memory at once. When the tx.commit() method is executed, the cache is cleared, and Hibernate executes 10,000 update statements to update the CUSTOMERS table:
update CUSTOMERS set AGE=? …. where ID=i;
update CUSTOMERS set AGE=? …. where ID= j;
……
update CUSTOMERS set AGE=? …. where ID=k; The

above batch update method has two disadvantages: it
takes up a lot of memory, and 10,000 Customer objects must be loaded into memory first, and then updated one by one.
The number of executed update statements is too large, each update statement can only update one Customer object, and 10,000 Customer objects must be updated through 10,000 update statements. Frequent access to the database will greatly reduce the performance of the application.

Generally speaking, batch operations at the application layer should be avoided as much as possible, and batch operations should be performed directly at the database layer, such as executing SQL statements for batch update or deletion directly in the database. If the logic of batch operations is complex, then Bulk operations can be done through stored procedures that run directly in the database.
Not all database systems support stored procedures. For example, the current MySQL does not support stored procedures, so batch updates or batch deletions cannot be performed through stored procedures.
Of course, batch operations can also be performed at the application layer, mainly in the following ways:
(1) Batch operations are performed through Session.
(2) Batch operations are performed through StatelessSession.
(3) Batch operations are performed through HQL.
(4) Batch operations are performed directly through the JDBC API.

1. Batch operation through

Session The save() and update() methods of Session will store the processed objects in their own cache. If a large number of persistent objects are processed through a Session object, the objects that have been processed and will not be accessed should be cleared from the cache in time. The specific method is to call the flush() method immediately after processing an object or a small batch of objects to clear the cache, and then call the clear() method to clear the cache.

Batch operations through Session are subject to the following constraints:
(1) The number of JDBC single batch processing needs to be set in the Hibernate configuration file. A reasonable value is usually between 10 and 50, for example:
hibernate.jdbc.batch_size= 20
When performing batch operations according to the method described in this section, it should be ensured that the number of batch SQL statements sent to the database each time is consistent with the batch_size attribute.
(2) If the object uses the "identity" identifier generator, Hibernate cannot perform bulk insert operations at the JDBC layer.
(3) When performing batch operations, it is recommended to close the second-level cache of Hibernate. The companion volume to this book, Mastering Hibernate: Advanced Edition, provides a detailed introduction to the second-level cache. Session's cache is Hibernate's first-level cache, which is usually a transaction-scoped cache, that is, each transaction has a separate first-level cache. SessionFactory's external cache is Hibernate's second-level cache, which is an application-wide cache, that is, all transactions share the same second-level cache. In any case, Hibernate's first level cache is always available. By default, the second-level cache of Hibernate is closed. In addition, the second-level cache can also be explicitly closed in the Hibernate configuration file by the following method:
hibernate.cache.use_second_level_cache=false

1．Batch insert data
The following code inserts a total of 100,000 CUSTOMERS records into the database, and inserts 20 CUSTOMERS records in a single batch:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //The number of single batch operations is 20
session.flush (); //Clear the cache and execute the SQL insert statement that inserts 20 records in batches
session.clear(); //Clear the Customer object in the cache
}
}

tx.commit();
session.close();

In the above program, each time the session.flush() method is executed, 20 records will be inserted into the database in batches. Next, the session.clear() method clears the 20 newly saved Customer objects from the cache.
In order to ensure the smooth operation of the above program, the following constraints need to be observed.
In the Hibernate configuration file, the hibernate.jdbc.batch_size property should also be set to 20.
Turn off the second level cache. Because if the second-level cache is used, all the Customer objects created in the first-level cache (that is, the Session cache) must be copied to the second-level cache first, and then saved to the database, which will lead to a large number of inconsistencies. necessary overhead.
The identifier generator for Customer objects cannot be 'identity'.

2. When updating data

in batches for batch updating, it is obviously not advisable to load all objects into the Session cache at once, and then update them one by one in the cache. To solve this problem, you can use the scrollable result set org.hibernate.ScrollableResults, Query's scroll() method returns a ScrollableResults object. The following code demonstrates batch updating of Customer objects, which initially loads all Customer objects using the ScrollableResults object:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

ScrollableResults customers= session.createQuery("from Customer")
.scroll(ScrollMode.FORWARD_ONLY);
int count=0;
while ( customers.next() ) {
Customer customer = (Customer) customers.get(0);
customer.setAge (customer.getAge()+1); //Update the age property of the Customer object
if ( ++count % 20 == 0 ) { //The number of single batch operations is 20

session.flush();//Clear the cache and execute the SQL update statement to update 20 records in batches
session.clear();//Empty the Customer object in the cache
}
}

tx.commit();
session.close();

在以上代码中，Query的scroll()方法返回的ScrollableResults对象中实际上并不包含任何Customer对象，它仅仅包含了用于在线定位数据库中CUSTOMERS记录的游标。只有当程序遍历访问ScrollableResults对象中的特定元素时，它才会到数据库中加载相应的Customer对象。
为了保证以上程序顺利运行，需要遵守以下约束：
在Hibernate的配置文件中，应该把hibernate.jdbc.batch_size属性也设为20。
关闭第二级缓存。假如已经在配置文件中启用了第二级缓存，也可以通过以下方式在程序中忽略第二级缓存：
ScrollableResults customers= session.createQuery("from Customer")
//忽略第二级缓存
.setCacheMode(CacheMode.IGNORE)
.scroll(ScrollMode.FORWARD_ONLY);

二、通过StatelessSession来进行批量操作

Session具有一个用于保持内存中对象与数据库中相应数据保持同步的缓存，位于Session缓存中的对象为持久化对象。但在进行批量操作时，把大量对象存放在Session缓存中会消耗大量内存空间。作为一种替代方案，可以采用无状态的StatelessSession来进行批量操作。
以下代码利用 StatelessSession来进行批量更新操作：
StatelessSession session = sessionFactory.openStatelessSession();
Transaction tx = session.beginTransaction();

ScrollableResults customers = session.getNamedQuery("GetCustomers")
.scroll(ScrollMode.FORWARD_ONLY);
while ( customers.next() ) {
Customer customer = (Customer) customers.get(0);
customer.setAge(customer.getAge()+1); //在内存中更新Customer对象的age属性;
session.update(customer);//立即执行update语句，更新数据库中相应CUSTOMERS记录
}

tx.commit();
session.close();

从形式上看，StatelessSession与Session的用法有点相似。StatelessSession与Session相比，有以下区别：
（1）StatelessSession没有缓存，通过StatelessSession来加载、保存或更新后的对象都处于游离状态。
（2）StatelessSession不会与Hibernate的第二级缓存交互。
（3）当调用StatelessSession的save()、update()或delete()方法时，这些方法会立即执行相应的SQL语句，而不会仅仅计划执行一条SQL语句。
（4）StatelessSession不会对所加载的对象自动进行脏检查。所以在以上程序中，修改了内存中Customer对象的属性后，还需要通过StatelessSession的update()方法来更新数据库中的相应数据。
（5）StatelessSession不会对关联的对象进行任何级联操作。举例来说，通过StatelessSession来保存一个Customer对象时，不会级联保存与之关联的Order对象。
（6）StatelessSession所做的操作可以被Interceptor拦截器捕获到，但会被Hibernate的事件处理系统忽略。
（7）通过同一个StatelessSession对象两次加载OID为1的Customer对象时，会得到两个具有不同内存地址的Customer对象，例如：
StatelessSession session = sessionFactory.openStatelessSession();
Customer c1=(Customer)session.get(Customer.class,new Long(1));
Customer c2=(Customer)session.get(Customer.class,new Long(1));
System.out.println(c1==c2); //打印false

三、通过HQL来进行批量操作

Hibernate3中的HQL（Hibernate Query Language，Hibernate查询语言）不仅可以检索数据，还可以用于进行批量更新、删除和插入数据。批量操作实际上直接在数据库中完成，所处理的数据不会被保存在Session的缓存中，因此不会占用内存空间。
Query.executeUpdate()方法和JDBC API中的PreparedStatement.executeUpdate()很相似，前者执行用于更新、删除和插入的HQL语句，而后者执行用于更新、删除和插入的SQL语句。

1．批量更新数据
以下程序代码演示通过HQL来批量更新Customer对象：
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

String hqlUpdate =
"update Customer c set c.name = :newName where c.name = :oldName";
int updatedEntities = session.createQuery( hqlUpdate )
.setString( "newName", "Mike" )
.setString( "oldName", "Tom" )
.executeUpdate();

tx.commit();
session.close();
以上程序代码向数据库发送的SQL语句为：
update CUSTOMERS set NAME="Mike" where NAME="Tom"

2．批量删除数据
Session的delete()方法一次只能删除一个对象，不适合进行批量删除操作。以下程序代码演示通过HQL来批量删除Customer对象：
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

String hqlDelete = "delete Customer c where c.name = :oldName";
int deletedEntities = session.createQuery( hqlDelete )
.setString( "oldName", "Tom" )
.executeUpdate();
tx.commit();
session.close();
以上程序代码向数据库提交的SQL语句为：
delete from CUSTOMERS where NAME="Tom"

3．批量插入数据
插入数据的HQL语法为：
insert into EntityName properties_list select_statement
以上EntityName表示持久化类的名字，properties_list表示持久化类的属性列表，select_statement表示子查询语句。
HQL只支持insert into ... select ... 形式的插入语句，而不支持"insert into ... values ... "形式的插入语句。
下面举例说明如何通过HQL来批量插入数据。假定有DelinquentAccount和Customer类，它们都有id和name属性，与这两个类对应的表分别为DELINQUENT_ACCOUNTS和CUSTOMERS表。DelinquentAccount.hbm.xml和Customer.hbm.xml文件分别为这两个类的映射文件。以下代码能够把CUSTOMERS表中的数据复制到DELINQUENT_ACCOUNTS表中：
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

String hqlInsert = "insert into DelinquentAccount (id, name) select c.id, c.name from Customer c where c.id>1";
int createdEntities = s.createQuery( hqlInsert )
.executeUpdate();
tx.commit();
session.close();

以上程序代码向数据库提交的SQL语句为：
insert into DELINQUENT_ACCOUNTS(ID,NAME) select ID,NAME from CUSTOMERS where ID>1

四、直接通过JDBC API来进行批量操作

当通过JDBC API来执行SQL insert、update和delete语句时，SQL语句中涉及到的数据不会被加载到内存中，因此不会占用内存空间。
以下程序直接通过JDBC API来执行用于批量更新的SQL语句：
Transaction tx = session.beginTransaction();
//获得该Session使用的数据库连接
Java.sql.Connection con=session.connection();
//通过JDBC API执行用于批量更新的SQL语句
PreparedStatement stmt=con.prepareStatement("update CUSTOMERS set AGE=AGE+1 "
+"where AGE>0 ");
stmt.executeUpdate();

tx.commit();

以上程序通过Session的connection()方法获得该Session使用的数据库连接，然后通过它创建PreparedStatement对象并执行SQL语句。值得注意的是，应用程序仍然通过Hibernate的Transaction接口来声明事务边界。
值得注意的是，在Hibernate3中，尽管Session的connection()方法还存在，但是已经被废弃，不提倡使用了，不过Hibernate3提供了替代方案：org.hibernate.jdbc.Work接口表示直接通过JDBC API来访问数据库的操作，Work接口的execute()方法用于执行直接通过JDBC API来访问数据库的操作：
public interface Work {
//直接通过JDBC API来访问数据库的操作
public void execute(Connection connection) throws SQLException;
}
Session的doWork(Work work)方法用于执行Work对象指定的操作，即调用Work对象的execute()方法。Session会把当前使用的数据库连接传给execute()方法。

以下程序演示了通过Work接口以及Session的doWork()方法来执行批量操作的过程：
Transaction tx=session.beginTransaction();
//定义一个匿名类，实现了Work接口
Work work=new Work(){
public void execute(Connection connection)throws SQLException{
//Execute SQL statement for batch update through JDBC API
PreparedStatement stmt=connection
.prepareStatement("update CUSTOMERS set AGE=AGE+1 "
+"where AGE>0 ");
stmt .executeUpdate();
}
};

//Execute work
session.doWork(work);
tx.commit();

When the SQL statement is executed through the PreparedStatement interface in the JDBC API, the data involved in the SQL statement will not be loaded into the Session cache, so it is not will take up memory space.

Reprinted from http://blog.csdn.net/weiling_shen/article/details/44960303

hibernate3 batch operations

Guess you like