Lucene
Lucene是很强大的检索工具,Hibernate Search将lucene core和JPA/Hibernate ORM结合起来,当我们通过JPA添加或者修改数据时,自动在Lucene中index了entity,在检索时采用lucene core搜索引起进行搜索,并返回JPA对象实体。
<dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search-orm</artifactId> <version>5.9.1.Final</version> </dependency>
设置Hibernate Search
在上下文的配置中:
@Bean public LocalContainerEntityManagerFactoryBean entityManagerFactory() throws PropertyVetoException{ Map<String, Object> properties = new Hashtable<>(); properties.put("javax.persistence.schema-generation.database.action","none"); /* 允许Hibernate ORM使用Hibernate Search。采用Lucene standalone (no Solr),并将索引保存在本地文件系统。 * 当本war里面的Hibernate ORM相关的数据库写时,将触发Hibernate search对相关内容进行索引,写入到文件中。 * 这种方式不适用于Tomcat集群的方式,如果采用Tomcat集群,需要使用Solr server。*/ properties.put("hibernate.search.default.directory_provider", "filesystem"); /* 本例放在 ../searchIndexes,开发环境中为eclipse的第一级目录 */ properties.put("hibernate.search.default.indexBase", "../searchIndexes"); properties.put("hibernate.show_sql", "true"); properties.put("hibernate.dialect", "org.hibernate.dialect.MySQL5InnoDBDialect"); LocalContainerEntityManagerFactoryBean factory = new LocalContainerEntityManagerFactoryBean(); factory.setJpaVendorAdapter(new HibernateJpaVendorAdapter()); factory.setDataSource(this.springJpaDataSource()); factory.setPackagesToScan("cn.wei.flowingflying.chapter23.entities"); factory.setSharedCacheMode(SharedCacheMode.ENABLE_SELECTIVE); factory.setValidationMode(ValidationMode.NONE); factory.setJpaPropertyMap(properties); return factory; }
小例子有关的Entity数据和小例子目的
这是数据库表UserPrincipal_23,映射为entity User:mysql> select * from UserPrincipal_23; +--------+----------+ | UserId | Username | +--------+----------+ | 4 | John | | 3 | Mike | | 1 | Nicholas | | 2 | Sarah | +--------+----------+
表格Post_23,映射为entity Post:
mysql> select * from Post_23; +--------+--------+----------------+--------------------------------------+------------+ | PostId | UserId | Title | Body | Keywords | +--------+--------+----------------+--------------------------------------+------------+ | 1 | 3 | Title One | Test One. Hello world! Java! | one java | | 2 | 1 | Title Two | Hello, my friend! This is title two. | two friend | | 3 | 1 | Hello Nicholas | My name is Nicholas! Hi, Nicholas | Nicholas | +--------+--------+----------------+--------------------------------------+------------+ mysql> select Post_23.*,UserPrincipal_23.Username from Post_23 left join UserPrincipal_23 on Post_23.UserId = UserPrincipal_23.UserId; +--------+--------+----------------+--------------------------------------+------------+----------+ | PostId | UserId | Title | Body | Keywords | Username | +--------+--------+----------------+--------------------------------------+------------+----------+ | 1 | 3 | Title One | Test One. Hello world! Java! | one java | Mike | | 2 | 1 | Title Two | Hello, my friend! This is title two. | two friend | Nicholas | | 3 | 1 | Hello Nicholas | My name is Nicholas! Hi, Nicholas | Nicholas | Nicholas | +--------+--------+----------------+--------------------------------------+------------+----------+
小例子搜索title,body,keywords和username。
小例子的search是两个表join,对于ORM,这里采用@ManyToOne,将在后面学习。对于Lucene,采用Hibernate search的标记。
被索引的entity
@Entity @Table(name="UserPrincipal_23") public class User { private long id; private String username; @Basic @Field //表明这个属性在Lucene中作为可索引项(被搜索内容) public String getUsername() { return username; } ... ... }
主entity
@Entity @Table(name="Post_23") /*【1】@Indexed:表明这个类对Hibernate search是全文检索,将自动为该实体创建或是更新Lucene的文档。 * 文档的Id由@DocumentId标识,如果不添加,则自动标注到entity的@Id */ @Indexed public class ForumPost { private long id; private User user;//表格通过外键UserId 关联 private String title; private String body; private String keywords; @Id @Column(name="PostId") @GeneratedValue(strategy=GenerationType.IDENTITY) /*【2】设置文档的Id: Hibernate Search为这个entity自动创建和更新document。@DocumentId用来表示这是document ID。 * 这里加在@Id上作为唯一标识,如果没有加,自动加在@Id上。*/ @DocumentId public long getId() { ... } @ManyToOne(fetch=FetchType.EAGER,optional=false) @JoinColumn(name="UserId") /*【4】索引到根entity(本例为索引至User):告诉Hibernate Search这是属性是另一个entity的Id。 * 关联的对象的属性也可以进行index,本例为User中的@Field String username。有点类似于级联的设置 */ @IndexedEmbedded public User getUser() { ... } @Basic @Field //【3】该属性需要进行全文搜索 public String getTitle() { ... } @Lob @Field //【3】该属性需要进行全文搜索 public String getBody() { ... } @SuppressWarnings("deprecation") @Basic /* Deprecated. Index-time boosting will not be possible anymore starting from Lucene 7. * You should use query-time boosting instead, for instance by calling boostedTo(float) * when building queries with the Hibernate Search query DSL. * @Boost:相关性加权 */ @Field(boost = @Boost(2.0F)) public String getKeywords() { ... } ... ... }
search的相关代码
同样的,我们提供SearchResult来存放entity和相关度分值。
public class SearchResult<T> { private final T entity; private final double relevance; ...... }
设置相关的仓库接口
public interface SearchableRepository<T>{ Page<SearchResult<T>> search(String query, Pageable pageable); }
public interface ForumPostRepository extends JpaRepository<ForumPost, Long>,SearchableRepository<ForumPost>{ }
在Hibernate Search中使用了Lucene文档,相关api和JPA的api相似,当然亦可以采用Lucene的API。我们看看具体的代码:
public class ForumPostRepositoryImpl implements SearchRepository<ForumPost>{ //【1】获取Hibernate search的全文检索的entity管理器,类似于JPA中的entityManager,相关的Lucene的全文搜索的 // 方法均给基于此FullTextEntityManager。请注意在root上下文配置的entityManagerFactory是涵盖了Hibernate search的相关设置。 @PersistenceContext EntityManager entityManager; EntityManagerProxy entityManagerProxy; // 1.1)在Spring框架中注入的@PersistenceContext EntityManager entityManager;,实际是EntityManger proxy //(为每个事务都代表提供一个新EntityManager),我们通过initialize()获取该proxy。 @PostConstruct public void initialize(){ if(!(this.entityManager instanceof EntityManagerProxy)) throw new FatalBeanException("Entity manager " + this.entityManager + " was not a proxy."); this.entityManagerProxy = (EntityManagerProxy) entityManager; } // 1.2)FullTextEntityManager是真实的,非proxy的,也就是我们需要为每次搜索,创建一个新的对象。 //(无法如Spring注入的EntityManager proxy那样默默为你自动实现)。且FullTextEntityManager的获取 // 必须通过一个真正的Hibernate ORM EntityManager实现(而不能通过proxy)来获取。 private FullTextEntityManager getFullTextEntityManager(){ return Search.getFullTextEntityManager(this.entityManagerProxy.getTargetEntityManager()); } //【2】Hibernate search的全文检索实现。 @Override public Page<SearchResult<ForumPost>> search(String query, Pageable pageable) { // 2.1)在事务中获取FullTextEntityManager FullTextEntityManager manager = getFullTextEntityManager(); // 2.2)进行search。Hibernate search的API和JPA的API有相似之处,因为都是Hibernate架构。 QueryBuilder builder = manager.getSearchFactory().buildQueryBuilder().forEntity(ForumPost.class).get(); Query lucene = builder.keyword() .onFields("title", "body", "keywords", "user.username") //指定要检索的属性,请注意user.username .matching(query) //matching里面为要检索的内容 .createQuery(); FullTextQuery q = manager.createFullTextQuery(lucene, ForumPost.class); q.setProjection(FullTextQuery.THIS,FullTextQuery.SCORE); //返回ForumPost和相关度 // 2.3)获取搜索的结果的数量 long total = q.getResultSize(); // 2.4)获取具体的内容 @SuppressWarnings("unchecked") List<Object[]> results = q.setFirstResult(pageable.getOffset()) .setMaxResults(pageable.getPageSize()) .getResultList(); List<SearchResult<ForumPost>> list = new ArrayList<>(); results.forEach(o -> list.add( new SearchResult<>((ForumPost)o[0], (Float)o[1])) ); return new PageImpl<>(list, pageable, total); } }