一次hibernate的优化实践

问题背景

　　项目中使用了hibernate来做ORM，当项目启动的时候需要把所有的数据加载到内存当中作为缓存和索引。日常开发中一切正常，导入正式数据测试时，由于数据量骤增，启动速度变得很慢，启动一次需要1个多时。其中大部分时间是hiberate在跟数据库交互，于是准备跟hibernate要效率。

备选方案

经过研究，初步定义以下两种方案：

1.优化hibernate的SQL查询：日志中出现了大量的hibernate 打的SQL，而且很多都是在进行关联查询的时候打出来的SQL，如果是自己写SQL查询的话可以用一条SQL查出来再封装就可以，而hibernate进行了多次查询，降低了效率，可以从优化SQL的方向入手解决。

2.使用hibernate二级缓存：目前二级缓存较为成熟，可以作为数据的一个懒加载实现，不需要一启动就把所有数据都放到内存里。

鉴于方案2代码改造量较大，而且改造后需要牺牲目前的查询索引，最后定下方案1.

开始优化

　　原代码的写法（简化改写过后）：

List<Order> orders = session.createQuery("from Order").list();

扫描二维码关注公众号，回复： 742329 查看本文章

在此处得到所有需要缓存的对象后开始构建索引，操作类似以下：

StringBuilder display = new StringBuilder();
for (Order order : orders) {
	display.append(order.getId()).append(": ");
	Customer customer = order.getCustomer();
	display.append(customer.getName());
	if (customer instanceof VipCustomer) {
		display.append("(vip): ");
		List<Category> categories =((VipCustomer)customer).getCategories();
		for (Category category : categories) {
			display.append(category.getItems().size()).append(";");
		}
	}
	display.append(" [");
	Set<Item> items = order.getItems();
	for (Item item : items) {
		display.append(item.getProduct()+", ");
		item.getCategories().size();
	}
	display.append("]\n");
			
}
System.out.println(display);

于是打出来SQL如下：

select order0_.id as id3_, order0_.CUSTOMER_ID as CUSTOMER2_3_ from T_ORDER order0_

select customer0_.id as id2_0_, customer0_.name as name2_0_, customer0_.interest as interest2_0_, customer0_.DTYPE as DTYPE2_0_ from Customer customer0_ where customer0_.id=?

select categories0_.VIP_ID as VIP3_2_1_, categories0_.id as id1_, categories0_.id as id1_0_, categories0_.name as name1_0_, categories0_.VIP_ID as VIP3_1_0_ from Category categories0_ where categories0_.VIP_ID=?

select categories0_.VIP_ID as VIP3_2_1_, categories0_.id as id1_, categories0_.id as id1_0_, categories0_.name as name1_0_, categories0_.VIP_ID as VIP3_1_0_ from Category categories0_ where categories0_.VIP_ID=?

以上重复N行。。。

select items0_.ORDER_ID as ORDER1_3_1_, items0_.ITEM_ID as ITEM2_1_, item1_.id as id0_0_, item1_.product as product0_0_, item1_.packageType as packageT4_0_0_, item1_.greeting as greeting0_0_, item1_.DTYPE as DTYPE0_0_ from Order_TO_ITEM items0_ inner join Item item1_ on items0_.ITEM_ID=item1_.id where items0_.ORDER_ID=?

select categories0_.ITEM_ID as ITEM2_0_2_, categories0_.CATEGORY_ID as CATEGORY1_2_, category1_.id as id1_0_, category1_.name as name1_0_, category1_.VIP_ID as VIP3_1_0_, vipcustome2_.id as id2_1_, vipcustome2_.name as name2_1_, vipcustome2_.interest as interest2_1_ from CATEGORY_TO_ITEM categories0_ inner join Category category1_ on categories0_.CATEGORY_ID=category1_.id left outer join Customer vipcustome2_ on category1_.VIP_ID=vipcustome2_.id where categories0_.ITEM_ID=?

以上重复N行

主要都是通过get order的关联对象以及关联对象的关联对象打出来的log，大概都关联了3，4级。

实现SQL优化，引入了 HQL的 inner join fetch 和 left join fetch。需要区别于一般的inner join，和 left join。中文名好像叫，迫切内连接和迫切左外连接（名称来源不详）。

List<Order> orders = session.createQuery("from Order newOrder inner join newOrder.customer").list();

这样出来的list里面有两种对象，一种是order，一种是inner join的customer，left join也一样。

List<Order> orders = session.createQuery("from Order newOrder inner join fetch newOrder.customer").list();

这么写出来的就是一个对象。对比一下更改前后的SQL变化：

--更改之前

select order0_.id as id3_, order0_.CUSTOMER_ID as CUSTOMER2_3_ from T_ORDER order0_

select customer0_.id as id2_0_, customer0_.name as name2_0_, customer0_.interest as interest2_0_, customer0_.DTYPE as DTYPE2_0_ from Customer customer0_ where customer0_.id=?


--更改之后

select order0_.id as id3_0_, customer1_.id as id1_1_, order0_.CUSTOMER_ID as CUSTOMER2_3_0_, customer1_.name as name1_1_, customer1_.interest as interest1_1_, customer1_.DTYPE as DTYPE1_1_ from T_ORDER order0_ inner join Customer customer1_ on order0_.CUSTOMER_ID=customer1_.id

原来分两次查询的SQL合成一次了。于是继续优化：

List<Order> orders = session.createQuery("from Order newOrder " +
		"inner join fetch newOrder.customer customer" +
		"left join fetch newOrder.items").list();

这两条SQL消失了：

select items0_.ORDER_ID as ORDER1_3_1_, items0_.ITEM_ID as ITEM2_1_, item1_.id as id0_0_, item1_.product as product0_0_, item1_.packageType as packageT4_0_0_, item1_.greeting as greeting0_0_, item1_.DTYPE as DTYPE0_0_ from Order_TO_ITEM items0_ inner join Item item1_ on items0_.ITEM_ID=item1_.id where items0_.ORDER_ID=?

select categories0_.ITEM_ID as ITEM2_0_2_, categories0_.CATEGORY_ID as CATEGORY1_2_, category1_.id as id1_0_, category1_.name as name1_0_, category1_.VIP_ID as VIP3_1_0_, vipcustome2_.id as id2_1_, vipcustome2_.name as name2_1_, vipcustome2_.interest as interest2_1_ from CATEGORY_TO_ITEM categories0_ inner join Category category1_ on categories0_.CATEGORY_ID=category1_.id left outer join Customer vipcustome2_ on category1_.VIP_ID=vipcustome2_.id where categories0_.ITEM_ID=?

但是list里却出来多条order记录，于是修改代码：

List<Order> orders = new ArrayList<Order>(new HashSet<Order>(session.createQuery("from Order newOrder " +
		"inner join fetch newOrder.customer customer" +
		"left join fetch newOrder.items").list()));

有点繁琐，但是很有效，代码最小改动。

下面要处理这两批SQL

select items0_.CATEGORY_ID as CATEGORY2_0_1_, items0_.ITEM_ID as ITEM1_1_, item1_.id as id2_0_, item1_.product as product2_0_ from CATEGORY_TO_ITEM items0_ inner join Item item1_ on items0_.ITEM_ID=item1_.id where items0_.CATEGORY_ID=?

select categories0_.ITEM_ID as ITEM1_2_2_, categories0_.CATEGORY_ID as CATEGORY2_2_, category1_.id as id0_0_, category1_.name as name0_0_, category1_.VIP_ID as VIP3_0_0_, vipcustome2_.id as id1_1_, vipcustome2_.name as name1_1_, vipcustome2_.interest as interest1_1_ from CATEGORY_TO_ITEM categories0_ inner join Category category1_ on categories0_.CATEGORY_ID=category1_.id left outer join Customer vipcustome2_ on category1_.VIP_ID=vipcustome2_.id where categories0_.ITEM_ID=?

这两条SQL是第三级和第四级的关联，由于categories是VipCustomer类的属性而并非全部的order里的Customer里面都是VipCustomer（类图及代码后文附上），所以不能直接使用inner left fetch。

使用 @Fetch annotation来控制get查询的SQL语句，对于Category类的items属性这样改：

@ManyToMany(cascade=CascadeType.ALL)
@JoinTable(name="CATEGORY_TO_ITEM",joinColumns=@JoinColumn(name="CATEGORY_ID"),inverseJoinColumns=@JoinColumn(name="ITEM_ID"))
@Fetch(FetchMode.SUBSELECT)
private List<Item> items;

对于Item类的categories属性则这样改：

@ManyToMany(cascade=CascadeType.REFRESH)
@JoinTable(name="CATEGORY_TO_ITEM",joinColumns=@JoinColumn(name="ITEM_ID"),inverseJoinColumns=@JoinColumn(name="CATEGORY_ID"))
@Fetch(FetchMode.SUBSELECT)
private List<Category> categories;

于是大量SQL 简化为两条：

select categories0_.ITEM_ID as ITEM1_2_2_, categories0_.CATEGORY_ID as CATEGORY2_2_, category1_.id as id0_0_, category1_.name as name0_0_, category1_.VIP_ID as VIP3_0_0_, vipcustome2_.id as id1_1_, vipcustome2_.name as name1_1_, vipcustome2_.interest as interest1_1_ from CATEGORY_TO_ITEM categories0_ inner join Category category1_ on categories0_.CATEGORY_ID=category1_.id left outer join Customer vipcustome2_ on category1_.VIP_ID=vipcustome2_.id where categories0_.ITEM_ID in (select item1_.id from CATEGORY_TO_ITEM items0_ inner join Item item1_ on items0_.ITEM_ID=item1_.id where items0_.CATEGORY_ID in (select categories0_.id from Category categories0_ where categories0_.VIP_ID=?))


select categories0_.ITEM_ID as ITEM1_2_2_, categories0_.CATEGORY_ID as CATEGORY2_2_, category1_.id as id0_0_, category1_.name as name0_0_, category1_.VIP_ID as VIP3_0_0_, vipcustome2_.id as id1_1_, vipcustome2_.name as name1_1_, vipcustome2_.interest as interest1_1_ from CATEGORY_TO_ITEM categories0_ inner join Category category1_ on categories0_.CATEGORY_ID=category1_.id left outer join Customer vipcustome2_ on category1_.VIP_ID=vipcustome2_.id where categories0_.ITEM_ID in (select item3_.id from T_ORDER order0_ inner join Customer customer1_ on order0_.CUSTOMER_ID=customer1_.id inner join Order_TO_ITEM items2_ on order0_.id=items2_.ORDER_ID inner join Item item3_ on items2_.ITEM_ID=item3_.id)

关于FetchMode，有三种枚举：

@Fetch(FetchMode.JOIN) 使用left join查询，但是在这次项目中设置成这个的话SQL数量反而更多而且似乎懒加载方式全部都不管用了。
@Fetch(FetchMode.SELECT) 没有变化和原来一样（N+1条SQL）
@Fetch(FetchMode.SUBSELECT) 使用in (.....)查询。

问题解决

　　启动数分钟之内完成。

附图：

源码不需要数据库即可运行