When the user enters a character in the search box, we should prompt the search item related to the character, as shown in the figure:
This function of prompting complete entries based on the letters entered by the user is automatic completion.
Because it needs to be inferred based on the pinyin letters, the pinyin word segmentation function is used.
1. Pinyin word breaker
To achieve completion based on letters, it is necessary to segment the document according to pinyin. The Pinyin participle plug-in thatGitHub
happens to be available on . elasticsearch
Address: https://github.com/medcl/elasticsearch-analysis-pinyin
The installation method is iK
the same as the tokenizer, in three steps:
① Download and decompress
② Upload to the virtual machine, elasticsearch
the plugin
directory
③ Restartelasticsearch
④ Test
For detailed installation steps, please refer to IK
the installation process of the tokenizer.
The test usage is as follows:
POST /_analyze
{
"text": "我爱北京天安门",
"analyzer": "pinyin"
}
result:
2. Custom tokenizer
The default pinyin word breaker divides each Chinese character into pinyin, but what we want is to form a set of pinyin for each entry. We need to customize the pinyin word breaker to form a custom word breaker.
elasticsearch
The composition of the middle tokenizer ( analyzer
) consists of three parts:
character filters
:tokenizer
Process the text before. e.g. delete characters, replace characterstokenizer
: Cut the text into tokens according to certain rules (term
). For examplekeyword
, it is not participle; andik_smart
tokenizer filter
: Dotokenizer
further processing on the output entry. For example, case conversion, synonyms processing, pinyin processing, etc.
When document word segmentation, the document will be processed by these three parts in turn:
The syntax for declaring a custom tokenizer is as follows:
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
// 自定义分词器
"my_analyzer": {
// 分词器名称
"tokenizer": "ik_max_word",
"filter": "py"
}
},
"filter": {
// 自定义tokenizer filter
"py": {
// 过滤器名称
"type": "pinyin", // 过滤器类型,这里是pinyin
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "ik_smart"
}
}
}
}
test:
Summarize:
How to use Pinyin tokenizer?
-
①Download
pinyin
the tokenizer -
② Unzip and put in
elasticsearch
theplugin
directory -
③Restart
How to customize the tokenizer?
-
① When creating an index library,
settings
configure it in , which can contain three parts -
②
character filter
-
③
tokenizer
-
④
filter
Precautions for pinyin word breaker?
- In order to avoid searching for homophones, do not use the pinyin word breaker when searching
3. Autocomplete query
elasticsearch
A Completion Suggester query is provided to implement auto-completion. This query will match terms beginning with the user input and return them. In order to improve the efficiency of the completion query, there are some constraints on the types of fields in the document:
-
Fields participating in the completion query must be
completion
of type. -
The content of the field is generally an array formed by multiple entries for completion.
For example, an index library like this:
// 创建索引库
PUT test
{
"mappings": {
"properties": {
"title":{
"type": "completion"
}
}
}
}
Then insert the following data:
// 示例数据
POST test/_doc
{
"title": ["Sony", "WH-1000XM3"]
}
POST test/_doc
{
"title": ["SK-II", "PITERA"]
}
POST test/_doc
{
"title": ["Nintendo", "switch"]
}
The query DSL
statement is as follows:
// 自动补全查询
GET /test/_search
{
"suggest": {
"title_suggest": {
"text": "s", // 关键字
"completion": {
"field": "title", // 补全查询的字段
"skip_duplicates": true, // 跳过重复的
"size": 10 // 获取前10条结果
}
}
}
}
4. Realize the automatic completion of the hotel search box
Now, our hotel
index library has not set up a pinyin word breaker, and we need to modify the configuration in the index library. But we know that the index library cannot be modified, it can only be deleted and then recreated.
In addition, we need to add a field for auto-completion, put brand
, suggestion
, city
and so on as auto-completion prompts.
So, to summarize, the things we need to do include:
-
Modify
hotel
the index library structure and set a custom pinyin word breaker -
name
Modify the field of the index libraryall
and use a custom tokenizer -
The index library adds a new field
suggestion
, the type iscompletion
type, using a custom tokenizer -
HotelDoc
Add fields to the classsuggestion
, the content containsbrand
,business
-
Reimport data to
hotel
library
4.1. Modify the hotel mapping structure
code show as below:
// 酒店数据索引库
PUT /hotel
{
"settings": {
"analysis": {
"analyzer": {
"text_anlyzer": {
"tokenizer": "ik_max_word",
"filter": "py"
},
"completion_analyzer": {
"tokenizer": "keyword",
"filter": "py"
}
},
"filter": {
"py": {
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "text_anlyzer",
"search_analyzer": "ik_smart",
"copy_to": "all"
},
"address":{
"type": "keyword",
"index": false
},
"price":{
"type": "integer"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword",
"copy_to": "all"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword",
"index": false
},
"all":{
"type": "text",
"analyzer": "text_anlyzer",
"search_analyzer": "ik_smart"
},
"suggestion":{
"type": "completion",
"analyzer": "completion_analyzer"
}
}
}
}
4.2. Modify HotelDoc
entity
HotelDoc
A field needs to be added in , which is used for auto-completion, and the content can be hotel brand, city, business district and other information. As required for autocomplete fields, preferably an array of these fields.
So we HotelDoc
add a suggestion
field in , the type is List<String>
, and then put brand
, city
, business
and other information into it.
code show as below:
package cn.itcast.hotel.pojo;
import lombok.Data;
import lombok.NoArgsConstructor;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
@Data
@NoArgsConstructor
public class HotelDoc {
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String location;
private String pic;
private Object distance;
private Boolean isAD;
private List<String> suggestion;
public HotelDoc(Hotel hotel) {
this.id = hotel.getId();
this.name = hotel.getName();
this.address = hotel.getAddress();
this.price = hotel.getPrice();
this.score = hotel.getScore();
this.brand = hotel.getBrand();
this.city = hotel.getCity();
this.starName = hotel.getStarName();
this.business = hotel.getBusiness();
this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
this.pic = hotel.getPic();
// 组装suggestion
if(this.business.contains("/")){
// business有多个值,需要切割
String[] arr = this.business.split("/");
// 添加元素
this.suggestion = new ArrayList<>();
this.suggestion.add(this.brand);
Collections.addAll(this.suggestion, arr);
}else {
this.suggestion = Arrays.asList(this.brand, this.business);
}
}
}
4.3. Reimport
Re-execute the previously written import data function, you can see that the new hotel data contains suggestion
:
4.4. Autocomplete queryJavaAPI
The previous auto-completion query DSL
did not have a corresponding one JavaAPI
. Here is an example:
@Test
void testSuggest() throws IOException {
// 1.准备Request
SearchRequest request = new SearchRequest("hotel");
// 2.准备DSL
request.source().suggest(new SuggestBuilder().addSuggestion(
"suggestions",
SuggestBuilders.completionSuggestion("suggestion")
.prefix("h")
.skipDuplicates(true)
.size(10)
));
// 3.发送请求
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4.解析响应
System.out.println("response = " + response);
}
The result of auto-completion is also quite special, the parsing code is as follows:
@Test
void testSuggest() throws IOException {
// 1.准备Request
SearchRequest request = new SearchRequest("hotel");
// 2.准备DSL
request.source().suggest(new SuggestBuilder().addSuggestion(
"suggestions",
SuggestBuilders.completionSuggestion("suggestion")
.prefix("h")
.skipDuplicates(true)
.size(10)
));
// 3.发送请求
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4.解析响应
//System.out.println("response = " + response);
Suggest suggest = response.getSuggest();
// 4.1 根据名称获取补全结果
CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
// 4.2 获取options并遍历
for (CompletionSuggestion.Entry.Option option : suggestions.getOptions()) {
// 4.3 获取一个option的text,,也就是补全的词条
String string = option.getText().string();
System.out.println(string);
}
}
4.5. Realize the automatic completion of the search box
1) Add a new interface cn.itcast.hotel.web
under the package HotelController
to receive new requests:
@GetMapping("suggestion")
public List<String> getSuggestions(@RequestParam("key") String prefix) {
return hotelService.getSuggestions(prefix);
}
2) Add the method in cn.itcast.hotel.service
the package :IhotelService
List<String> getSuggestions(String prefix);
3) cn.itcast.hotel.service.impl.HotelService
Implement the method in:
@Override
public List<String> getSuggestions(String prefix) {
try {
// 1.准备Request
SearchRequest request = new SearchRequest("hotel");
// 2.准备DSL
request.source().suggest(new SuggestBuilder().addSuggestion(
"suggestions",
SuggestBuilders.completionSuggestion("suggestion")
.prefix(prefix)
.skipDuplicates(true)
.size(10)
));
// 3.发起请求
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4.解析结果
Suggest suggest = response.getSuggest();
// 4.1.根据补全查询名称,获取补全结果
CompletionSuggestion suggestions = suggest.getSuggestion("suggestions");
// 4.2.获取options
List<CompletionSuggestion.Entry.Option> options = suggestions.getOptions();
// 4.3.遍历
List<String> list = new ArrayList<>(options.size());
for (CompletionSuggestion.Entry.Option option : options) {
String text = option.getText().toString();
list.add(text);
}
return list;
} catch (IOException e) {
throw new RuntimeException(e);
}
}