Spring Boot 2.0 integrates ES 5 article content search practice

From: Spring For All Community " Spring Boot 2.0 Integrating ES 5 Article Content Search Practice " http://www.spring4all.com/article/396

Contents of this chapter

  1. Article content search ideas
  2. Search content word segmentation
  3. search query
  4. filter
  5. Pagination, sorting conditions
  6. summary

Reading time: 8 minutes

Excerpt: Those who intend to start with extraordinary and write a masterpiece, often cannot insist on completing the first chapter

First, the article content search ideas

The last article talked about how to integrate ES 5 on Spring Boot 2.0, and this article talks about the actual combat. Briefly talk about how to realize the specific implementation of content search such as articles and questions and answers. The realization idea is very simple:

  • Based on "phrase match" and set the minimum match weight value
  • Where did the phrase come from, use the IK tokenizer to segment
  • Filter based on Fiter
  • Paging sorting based on Pageable

If you directly call the search here, it is easy to find unsatisfactory things. Because content search focuses on the connectivity of content. Therefore, the processing method here is relatively low, and I hope to communicate with each other to achieve a better search method. It is to get a lot of phrases through word segmentation, and then use the phrases to accurately match the phrases.

ES Installing the IK tokenizer plugin is easy. The first step is to download the corresponding version https://github.com/medcl/elasticsearch-analysis-ik/releases. The second step is to create a new folder ik in the elasticsearch-5.5.3/plugins directory, and copy the decompressed files of elasticsearch-analysis-ik-5.5.3.zip to the elasticsearch-5.1.1/plugins/ik directory . Finally restart ES.

2. Search Content Segmentation

Installed IK, how to call it?

In the first step, the search content on my side will be spliced ​​and passed in with commas. So it will split the comma first

The second step is to add yourself to the search term, because some words are gone after ik segmentation... This is a bug

The third step is to use the AnalyzeRequestBuilder object to obtain the list of return value objects after IK word segmentation

The fourth step is to optimize the word segmentation results. For example, if they are all words, keep them all; if there are words and characters, keep the words; if there are only words, keep the words

The core implementation code is as follows:

    /**
     * 搜索内容分词
     */
    protected List<String> handlingSearchContent(String searchContent) {

        List<String> searchTermResultList = new ArrayList<>();
        // 按逗号分割,获取搜索词列表
        List<String> searchTermList = Arrays.asList(searchContent.split(SearchConstant.STRING_TOKEN_SPLIT));

        // 如果搜索词大于 1 个字,则经过 IK 分词器获取分词结果列表
        searchTermList.forEach(searchTerm -> {
            // 搜索词 TAG 本身加入搜索词列表,并解决 will 这种问题
            searchTermResultList.add(searchTerm);
            // 获取搜索词 IK 分词列表
            searchTermResultList.addAll(getIkAnalyzeSearchTerms(searchTerm));
        });

        return searchTermResultList;
    }

    /**
     * 调用 ES 获取 IK 分词后结果
     */
    protected List<String> getIkAnalyzeSearchTerms(String searchContent) {
        AnalyzeRequestBuilder ikRequest = new AnalyzeRequestBuilder(elasticsearchTemplate.getClient(),
                AnalyzeAction.INSTANCE, SearchConstant.INDEX_NAME, searchContent);
        ikRequest.setTokenizer(SearchConstant.TOKENIZER_IK_MAX);
        List<AnalyzeResponse.AnalyzeToken> ikTokenList = ikRequest.execute().actionGet().getTokens();

        // 循环赋值
        List<String> searchTermList = new ArrayList<>();
        ikTokenList.forEach(ikToken -> {
            searchTermList.add(ikToken.getTerm());
        });

        return handlingIkResultTerms(searchTermList);
    }

    /**
     * 如果分词结果:洗发水(洗发、发水、洗、发、水)
     * - 均为词,保留
     * - 词 + 字,只保留词
     * - 均为字,保留字
     */
    private List<String> handlingIkResultTerms(List<String> searchTermList) {
        Boolean isPhrase = false;
        Boolean isWord = false;
        for (String term : searchTermList) {
            if (term.length() > SearchConstant.SEARCH_TERM_LENGTH) {
                isPhrase = true;
            } else {
                isWord = true;
            }
        }

        if (isWord & isPhrase) {
            List<String> phraseList = new ArrayList<>();
            searchTermList.forEach(term -> {
                if (term.length() > SearchConstant.SEARCH_TERM_LENGTH) {
                    phraseList.add(term);
                }
            });
            return phraseList;
        }

        return searchTermList;
    }

Third, the search query

Construct a content enumeration object and list the fields to be searched. The code of ContentSearchTermEnum is as follows:

import lombok.AllArgsConstructor;

@AllArgsConstructor
public enum ContentSearchTermEnum {

    // 标题
    TITLE("title"),
    // 内容
    CONTENT("content");

    /**
     * 搜索字段
     */
    private String name;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

}

Loop through the "Phrase Search Match" search field, then set the minimum weight value to 1. The core code is as follows:

    /**
     * 构造查询条件
     */
    private void buildMatchQuery(BoolQueryBuilder queryBuilder, List<String> searchTermList) {
        for (String searchTerm : searchTermList) {
            for (ContentSearchTermEnum searchTermEnum : ContentSearchTermEnum.values()) {
                queryBuilder.should(QueryBuilders.matchPhraseQuery(searchTermEnum.getName(), searchTerm));
            }
        }
        queryBuilder.minimumShouldMatch(SearchConstant.MINIMUM_SHOULD_MATCH);
    }

4. Filtering Conditions

There is more than a search, and sometimes the demand is like this. Need to search under a certain category, for example, e-commerce needs to search for products under a certain brand. Then you need to construct some fitlers for filtering. Corresponding to the OR and AND statements under Where in the SQL statement. Add filtering in ES using the filter method. code show as below:

    /**
     * 构建筛选条件
     */
    private void buildFilterQuery(BoolQueryBuilder boolQueryBuilder, Integer type, String category) {
        // 内容类型筛选
        if (type != null) {
            BoolQueryBuilder typeFilterBuilder = QueryBuilders.boolQuery();
            typeFilterBuilder.should(QueryBuilders.matchQuery(SearchConstant.TYPE_NAME, type).lenient(true));
            boolQueryBuilder.filter(typeFilterBuilder);
        }

        // 内容类别筛选
        if (!StringUtils.isEmpty(category)) {
            BoolQueryBuilder categoryFilterBuilder = QueryBuilders.boolQuery();
            categoryFilterBuilder.should(QueryBuilders.matchQuery(SearchConstant.CATEGORY_NAME, category).lenient(true));
            boolQueryBuilder.filter(categoryFilterBuilder);
        }
    }

type is a large category, and category is a small category, so that it can support size category filtering. But what if you need to search in type = 1 or type = 2? The specific implementation code is very simple:

typeFilterBuilder
	.should(QueryBuilders.matchQuery(SearchConstant.TYPE_NAME, 1)
	.should(QueryBuilders.matchQuery(SearchConstant.TYPE_NAME, 2)
	.lenient(true));

Through the chain expression, the two should achieve OR, that is, the OR statement corresponding to SQL. The AND statement corresponding to SQL is implemented through two BoolQueryBuilders.

Five, paging, sorting conditions

The pagination sorting code is very simple:

   @Override
    public PageBean searchContent(ContentSearchBean contentSearchBean) {

        Integer pageNumber = contentSearchBean.getPageNumber();
        Integer pageSize = contentSearchBean.getPageSize();

        PageBean<ContentEntity> resultPageBean = new PageBean<>();
        resultPageBean.setPageNumber(pageNumber);
        resultPageBean.setPageSize(pageSize);

        // 构建搜索短语
        String searchContent = contentSearchBean.getSearchContent();
        List<String> searchTermList = handlingSearchContent(searchContent);

        // 构建查询条件
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        buildMatchQuery(boolQueryBuilder, searchTermList);

        // 构建筛选条件
        buildFilterQuery(boolQueryBuilder, contentSearchBean.getType(), contentSearchBean.getCategory());

        // 构建分页、排序条件
        Pageable pageable = PageRequest.of(pageNumber, pageSize);
        if (!StringUtils.isEmpty(contentSearchBean.getOrderName())) {
            pageable = PageRequest.of(pageNumber, pageSize, Sort.Direction.DESC, contentSearchBean.getOrderName());
        }
        SearchQuery searchQuery = new NativeSearchQueryBuilder().withPageable(pageable)
                .withQuery(boolQueryBuilder).build();

        // 搜索
        LOGGER.info("\n ContentServiceImpl.searchContent() [" + searchContent
                + "] \n DSL  = \n " + searchQuery.getQuery().toString());
        Page<ContentEntity> contentPage = contentRepository.search(searchQuery);

        resultPageBean.setResult(contentPage.getContent());
        resultPageBean.setTotalCount((int) contentPage.getTotalElements());
        resultPageBean.setTotalPage((int) contentPage.getTotalElements() / resultPageBean.getPageSize() + 1);
        return resultPageBean;
    }

Use the Pageable object to construct paging parameters and specify the corresponding sort field and sort order (DESC ASC).

6. Summary

This idea is relatively simple. If you have a more hanging implementation method, welcome to exchange and discuss.

For more ES articles, pay attention to get a series of tutorial articles!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325445102&siteId=291194637