Java for Web学习笔记(一二四):搜索(6)Lucene和Hibernate search

Lucene

Lucene是很强大的检索工具,Hibernate Search将lucene core和JPA/Hibernate ORM结合起来,当我们通过JPA添加或者修改数据时,自动在Lucene中index了entity,在检索时采用lucene core搜索引起进行搜索,并返回JPA对象实体。

<dependency>
     <groupId>org.hibernate</groupId>
     <artifactId>hibernate-search-orm</artifactId>
     <version>5.9.1.Final</version>
</dependency>

设置Hibernate Search

在上下文的配置中:

@Bean
public LocalContainerEntityManagerFactoryBean entityManagerFactory() throws PropertyVetoException{
    Map<String, Object> properties = new Hashtable<>();
    properties.put("javax.persistence.schema-generation.database.action","none");
    /* 允许Hibernate ORM使用Hibernate Search。采用Lucene standalone (no Solr),并将索引保存在本地文件系统。
     * 当本war里面的Hibernate ORM相关的数据库写时,将触发Hibernate search对相关内容进行索引,写入到文件中。
     * 这种方式不适用于Tomcat集群的方式,如果采用Tomcat集群,需要使用Solr server。*/
    properties.put("hibernate.search.default.directory_provider", "filesystem");
    /* 本例放在 ../searchIndexes,开发环境中为eclipse的第一级目录 */
    properties.put("hibernate.search.default.indexBase", "../searchIndexes");		
    properties.put("hibernate.show_sql", "true");
    properties.put("hibernate.dialect", "org.hibernate.dialect.MySQL5InnoDBDialect");

    LocalContainerEntityManagerFactoryBean factory = new LocalContainerEntityManagerFactoryBean();
    factory.setJpaVendorAdapter(new HibernateJpaVendorAdapter());
    factory.setDataSource(this.springJpaDataSource());
    factory.setPackagesToScan("cn.wei.flowingflying.chapter23.entities");
    factory.setSharedCacheMode(SharedCacheMode.ENABLE_SELECTIVE);
    factory.setValidationMode(ValidationMode.NONE);
    factory.setJpaPropertyMap(properties);

    return factory;
}

小例子有关的Entity数据和小例子目的

这是数据库表UserPrincipal_23,映射为entity User:
mysql> select * from UserPrincipal_23;
+--------+----------+
| UserId | Username |
+--------+----------+
|      4 | John     |
|      3 | Mike     |
|      1 | Nicholas |
|      2 | Sarah    |
+--------+----------+

表格Post_23,映射为entity Post:

mysql> select * from Post_23;
+--------+--------+----------------+--------------------------------------+------------+
| PostId | UserId | Title          | Body                                 | Keywords   |
+--------+--------+----------------+--------------------------------------+------------+
|      1 |      3 | Title One      | Test One. Hello world! Java!         | one java   |
|      2 |      1 | Title Two      | Hello, my friend! This is title two. | two friend |
|      3 |      1 | Hello Nicholas | My name is Nicholas! Hi, Nicholas    | Nicholas   |
+--------+--------+----------------+--------------------------------------+------------+

mysql> select Post_23.*,UserPrincipal_23.Username from Post_23 left join UserPrincipal_23 on Post_23.UserId = UserPrincipal_23.UserId;
+--------+--------+----------------+--------------------------------------+------------+----------+
| PostId | UserId | Title          | Body                                 | Keywords   | Username |
+--------+--------+----------------+--------------------------------------+------------+----------+
|      1 |      3 | Title One      | Test One. Hello world! Java!         | one java   | Mike     |
|      2 |      1 | Title Two      | Hello, my friend! This is title two. | two friend | Nicholas |
|      3 |      1 | Hello Nicholas | My name is Nicholas! Hi, Nicholas    | Nicholas   | Nicholas |
+--------+--------+----------------+--------------------------------------+------------+----------+

小例子搜索title,body,keywords和username。

小例子的search是两个表join,对于ORM,这里采用@ManyToOne,将在后面学习。对于Lucene,采用Hibernate search的标记。

被索引的entity

@Entity
@Table(name="UserPrincipal_23")
public class User {
    private long id;
    private String username;
	
    @Basic
    @Field //表明这个属性在Lucene中作为可索引项(被搜索内容)
    public String getUsername() {
        return username;
    }
    ... ...		
}

主entity

@Entity
@Table(name="Post_23")
/*【1】@Indexed:表明这个类对Hibernate search是全文检索,将自动为该实体创建或是更新Lucene的文档。 
 * 文档的Id由@DocumentId标识,如果不添加,则自动标注到entity的@Id */
@Indexed 
public class ForumPost {
    private long id;
    private User user;//表格通过外键UserId 关联
    private String title;
    private String body;
    private String keywords;

    @Id
    @Column(name="PostId")
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    /*【2】设置文档的Id: Hibernate Search为这个entity自动创建和更新document。@DocumentId用来表示这是document ID。
     * 这里加在@Id上作为唯一标识,如果没有加,自动加在@Id上。*/
    @DocumentId
    public long getId() { ... }

    @ManyToOne(fetch=FetchType.EAGER,optional=false)
    @JoinColumn(name="UserId")
    /*【4】索引到根entity(本例为索引至User):告诉Hibernate Search这是属性是另一个entity的Id。
     * 关联的对象的属性也可以进行index,本例为User中的@Field String username。有点类似于级联的设置 */	
    @IndexedEmbedded 
    public User getUser() { ... }

    @Basic
    @Field //【3】该属性需要进行全文搜索
    public String getTitle() { ... }

    @Lob
    @Field //【3】该属性需要进行全文搜索
    public String getBody() { ... }
	
    @SuppressWarnings("deprecation")
    @Basic
    /* Deprecated.  Index-time boosting will not be possible anymore starting from Lucene 7. 
     * You should use query-time boosting instead, for instance by calling boostedTo(float) 
     * when building queries with the Hibernate Search query DSL.
     * @Boost:相关性加权 */ 
    @Field(boost = @Boost(2.0F))
    public String getKeywords() { ... }
    ... ...	
}

search的相关代码

同样的,我们提供SearchResult来存放entity和相关度分值。

public class SearchResult<T> {
	private final T entity;
	private final double relevance;	
	......
}

设置相关的仓库接口

public interface SearchableRepository<T>{
    Page<SearchResult<T>> search(String query, Pageable pageable);
}
public interface ForumPostRepository extends JpaRepository<ForumPost, Long>,SearchableRepository<ForumPost>{
}

在Hibernate Search中使用了Lucene文档,相关api和JPA的api相似,当然亦可以采用Lucene的API。我们看看具体的代码:

public class ForumPostRepositoryImpl implements SearchRepository<ForumPost>{
    //【1】获取Hibernate search的全文检索的entity管理器,类似于JPA中的entityManager,相关的Lucene的全文搜索的
    // 方法均给基于此FullTextEntityManager。请注意在root上下文配置的entityManagerFactory是涵盖了Hibernate search的相关设置。
    @PersistenceContext EntityManager entityManager;
    EntityManagerProxy entityManagerProxy;

    // 1.1)在Spring框架中注入的@PersistenceContext EntityManager entityManager;,实际是EntityManger proxy
    //(为每个事务都代表提供一个新EntityManager),我们通过initialize()获取该proxy。 
    @PostConstruct
    public void initialize(){
        if(!(this.entityManager instanceof EntityManagerProxy))
            throw new FatalBeanException("Entity manager " + this.entityManager + " was not a proxy.");
        this.entityManagerProxy = (EntityManagerProxy) entityManager;
    }

    // 1.2)FullTextEntityManager是真实的,非proxy的,也就是我们需要为每次搜索,创建一个新的对象。
    //(无法如Spring注入的EntityManager proxy那样默默为你自动实现)。且FullTextEntityManager的获取
    // 必须通过一个真正的Hibernate ORM EntityManager实现(而不能通过proxy)来获取。  
    private FullTextEntityManager getFullTextEntityManager(){
        return Search.getFullTextEntityManager(this.entityManagerProxy.getTargetEntityManager());
    }

    //【2】Hibernate search的全文检索实现。
    @Override
    public Page<SearchResult<ForumPost>> search(String query, Pageable pageable) {
        // 2.1)在事务中获取FullTextEntityManager 
        FullTextEntityManager manager = getFullTextEntityManager();

        // 2.2)进行search。Hibernate search的API和JPA的API有相似之处,因为都是Hibernate架构。 
        QueryBuilder builder = manager.getSearchFactory().buildQueryBuilder().forEntity(ForumPost.class).get();
        Query lucene = builder.keyword()
               .onFields("title", "body", "keywords", "user.username") //指定要检索的属性,请注意user.username
               .matching(query) //matching里面为要检索的内容
               .createQuery();

        FullTextQuery q = manager.createFullTextQuery(lucene, ForumPost.class);
        q.setProjection(FullTextQuery.THIS,FullTextQuery.SCORE); //返回ForumPost和相关度

        // 2.3)获取搜索的结果的数量
        long total = q.getResultSize(); 

        // 2.4)获取具体的内容
        @SuppressWarnings("unchecked")
        List<Object[]> results = q.setFirstResult(pageable.getOffset())
                                  .setMaxResults(pageable.getPageSize())
                                  .getResultList();
        List<SearchResult<ForumPost>> list = new ArrayList<>();
        results.forEach(o -> list.add(
                                   new SearchResult<>((ForumPost)o[0], (Float)o[1])) );

        return new PageImpl<>(list, pageable, total);
    }
}

相关链接:我的Professional Java for Web Applications相关文章

猜你喜欢

转载自blog.csdn.net/flowingflying/article/details/80498669