SpringCloud: Automatic completion of ElasticSearch

When the user enters a character in the search box, we should prompt the search item related to the character, as shown in the figure:

insert image description here

This function of prompting complete entries based on the letters entered by the user is automatic completion.

Because it needs to be inferred based on the pinyin letters, the pinyin word segmentation function is used.

1. Pinyin word breaker

To achieve completion based on letters, it is necessary to segment the document according to pinyin. The Pinyin participle plug-in thatGitHub happens to be available on . elasticsearchAddress: https://github.com/medcl/elasticsearch-analysis-pinyin

insert image description here

The installation method is iKthe same as the tokenizer, in three steps:

① Download and decompress

② Upload to the virtual machine, elasticsearchthe plugindirectory

③ Restartelasticsearch

④ Test

For detailed installation steps, please refer to IKthe installation process of the tokenizer.

The test usage is as follows:

POST /_analyze
{
    
    
  "text": "我爱北京天安门",
  "analyzer": "pinyin"
}

result:

insert image description here

2. Custom tokenizer

The default pinyin word breaker divides each Chinese character into pinyin, but what we want is to form a set of pinyin for each entry. We need to customize the pinyin word breaker to form a custom word breaker.

elasticsearchThe composition of the middle tokenizer ( analyzer) consists of three parts:

  • character filters: tokenizerProcess the text before. e.g. delete characters, replace characters
  • tokenizer: Cut the text into tokens according to certain rules ( term). For example keyword, it is not participle; andik_smart
  • tokenizer filter: Do tokenizerfurther processing on the output entry. For example, case conversion, synonyms processing, pinyin processing, etc.

When document word segmentation, the document will be processed by these three parts in turn:

insert image description here

The syntax for declaring a custom tokenizer is as follows:

PUT /test
{
    
    
  "settings": {
    
    
    "analysis": {
    
    
      "analyzer": {
    
     // 自定义分词器
        "my_analyzer": {
    
      // 分词器名称
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
    
     // 自定义tokenizer filter
        "py": {
    
     // 过滤器名称
          "type": "pinyin", // 过滤器类型,这里是pinyin
		  "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    
    
    "properties": {
    
    
      "name": {
    
    
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

insert image description here

test:

insert image description here

Summarize:

How to use Pinyin tokenizer?

  • ①Download pinyinthe tokenizer

  • ② Unzip and put in elasticsearchthe plugindirectory

  • ③Restart

How to customize the tokenizer?

  • ① When creating an index library, settingsconfigure it in , which can contain three parts

  • character filter

  • tokenizer

  • filter

Precautions for pinyin word breaker?

  • In order to avoid searching for homophones, do not use the pinyin word breaker when searching

3. Autocomplete query

elasticsearchA Completion Suggester query is provided to implement auto-completion. This query will match terms beginning with the user input and return them. In order to improve the efficiency of the completion query, there are some constraints on the types of fields in the document:

  • Fields participating in the completion query must be completionof type.

  • The content of the field is generally an array formed by multiple entries for completion.

For example, an index library like this:

// 创建索引库
PUT test
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "title":{
    
    
        "type": "completion"
      }
    }
  }
}

Then insert the following data:

// 示例数据
POST test/_doc
{
    
    
  "title": ["Sony", "WH-1000XM3"]
}
POST test/_doc
{
    
    
  "title": ["SK-II", "PITERA"]
}
POST test/_doc
{
    
    
  "title": ["Nintendo", "switch"]
}

The query DSLstatement is as follows:

// 自动补全查询
GET /test/_search
{
    
    
  "suggest": {
    
    
    "title_suggest": {
    
    
      "text": "s", // 关键字
      "completion": {
    
    
        "field": "title", // 补全查询的字段
        "skip_duplicates": true, // 跳过重复的
        "size": 10 // 获取前10条结果
      }
    }
  }
}

4. Realize the automatic completion of the hotel search box

Now, our hotelindex library has not set up a pinyin word breaker, and we need to modify the configuration in the index library. But we know that the index library cannot be modified, it can only be deleted and then recreated.

In addition, we need to add a field for auto-completion, put brand, suggestion, cityand so on as auto-completion prompts.

So, to summarize, the things we need to do include:

  1. Modify hotelthe index library structure and set a custom pinyin word breaker

  2. nameModify the field of the index library alland use a custom tokenizer

  3. The index library adds a new field suggestion, the type is completiontype, using a custom tokenizer

  4. HotelDocAdd fields to the class suggestion, the content contains brand,business

  5. Reimport data to hotellibrary

4.1. Modify the hotel mapping structure

code show as below:

// 酒店数据索引库
PUT /hotel
{
    
    
  "settings": {
    
    
    "analysis": {
    
    
      "analyzer": {
    
    
        "text_anlyzer": {
    
    
          "tokenizer": "ik_max_word",
          "filter": "py"
        },
        "completion_analyzer": {
    
    
          "tokenizer": "keyword",
          "filter": "py"
        }
      },
      "filter": {
    
    
        "py": {
    
    
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    
    
    "properties": {
    
    
      "id":{
    
    
        "type": "keyword"
      },
      "name":{
    
    
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart",
        "copy_to": "all"
      },
      "address":{
    
    
        "type": "keyword",
        "index": false
      },
      "price":{
    
    
        "type": "integer"
      },
      "score":{
    
    
        "type": "integer"
      },
      "brand":{
    
    
        "type": "keyword",
        "copy_to": "all"
      },
      "city":{
    
    
        "type": "keyword"
      },
      "starName":{
    
    
        "type": "keyword"
      },
      "business":{
    
    
        "type": "keyword",
        "copy_to": "all"
      },
      "location":{
    
    
        "type": "geo_point"
      },
      "pic":{
    
    
        "type": "keyword",
        "index": false
      },
      "all":{
    
    
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart"
      },
      "suggestion":{
    
    
          "type": "completion",
          "analyzer": "completion_analyzer"
      }
    }
  }
}

4.2. Modify HotelDocentity

HotelDocA field needs to be added in , which is used for auto-completion, and the content can be hotel brand, city, business district and other information. As required for autocomplete fields, preferably an array of these fields.

So we HotelDocadd a suggestionfield in , the type is List<String>, and then put brand, city, businessand other information into it.

code show as below:

package cn.itcast.hotel.pojo;

import lombok.Data;
import lombok.NoArgsConstructor;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

@Data
@NoArgsConstructor
public class HotelDoc {
    
    
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String location;
    private String pic;
    private Object distance;
    private Boolean isAD;
    private List<String> suggestion;

    public HotelDoc(Hotel hotel) {
    
    
        this.id = hotel.getId();
        this.name = hotel.getName();
        this.address = hotel.getAddress();
        this.price = hotel.getPrice();
        this.score = hotel.getScore();
        this.brand = hotel.getBrand();
        this.city = hotel.getCity();
        this.starName = hotel.getStarName();
        this.business = hotel.getBusiness();
        this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
        this.pic = hotel.getPic();
        // 组装suggestion
        if(this.business.contains("/")){
    
    
            // business有多个值,需要切割
            String[] arr = this.business.split("/");
            // 添加元素
            this.suggestion = new ArrayList<>();
            this.suggestion.add(this.brand);
            Collections.addAll(this.suggestion, arr);
        }else {
    
    
            this.suggestion = Arrays.asList(this.brand, this.business);
        }
    }
}

4.3. Reimport

Re-execute the previously written import data function, you can see that the new hotel data contains suggestion:

insert image description here

4.4. Autocomplete queryJavaAPI

The previous auto-completion query DSLdid not have a corresponding one JavaAPI. Here is an example:

insert image description here

    @Test
    void testSuggest() throws IOException {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        request.source().suggest(new SuggestBuilder().addSuggestion(
                "suggestions",
                SuggestBuilders.completionSuggestion("suggestion")
                        .prefix("h")
                        .skipDuplicates(true)
                        .size(10)
        ));
        // 3.发送请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析响应
        System.out.println("response = " + response);
    }

insert image description here

The result of auto-completion is also quite special, the parsing code is as follows:

insert image description here

    @Test
    void testSuggest() throws IOException {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        request.source().suggest(new SuggestBuilder().addSuggestion(
                "suggestions",
                SuggestBuilders.completionSuggestion("suggestion")
                        .prefix("h")
                        .skipDuplicates(true)
                        .size(10)
        ));
        // 3.发送请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析响应
        //System.out.println("response = " + response);
        Suggest suggest = response.getSuggest();
        // 4.1 根据名称获取补全结果
        CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
        // 4.2 获取options并遍历
        for (CompletionSuggestion.Entry.Option option : suggestions.getOptions()) {
    
    
            // 4.3 获取一个option的text,,也就是补全的词条
            String string = option.getText().string();
            System.out.println(string);
        }
    }

insert image description here

4.5. Realize the automatic completion of the search box

1) Add a new interface cn.itcast.hotel.webunder the package HotelControllerto receive new requests:

@GetMapping("suggestion")
public List<String> getSuggestions(@RequestParam("key") String prefix) {
    
    
    return hotelService.getSuggestions(prefix);
}

2) Add the method in cn.itcast.hotel.servicethe package :IhotelService

List<String> getSuggestions(String prefix);

3) cn.itcast.hotel.service.impl.HotelServiceImplement the method in:

@Override
public List<String> getSuggestions(String prefix) {
    
    
    try {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        request.source().suggest(new SuggestBuilder().addSuggestion(
            "suggestions",
            SuggestBuilders.completionSuggestion("suggestion")
            .prefix(prefix)
            .skipDuplicates(true)
            .size(10)
        ));
        // 3.发起请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析结果
        Suggest suggest = response.getSuggest();
        // 4.1.根据补全查询名称,获取补全结果
        CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
        // 4.2.获取options
        List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
        // 4.3.遍历
        List<String> list = new ArrayList<>(options.size());
        for (CompletionSuggestion.Entry.Option option : options) {
    
    
            String text = option.getText().toString();
            list.add(text);
        }
        return list;
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

insert image description here

insert image description here

Guess you like

Origin blog.csdn.net/qq_37726813/article/details/130311714