Elasticsearch actual combat (2): Springboot realizes Elasticsearch automatic Chinese character and pinyin completion, and Springboot realizes automatic spelling error correction

Series Article Index

Elasticsearch actual combat (1): Springboot implements Elasticsearch unified retrieval function
Elasticsearch actual combat (2): Springboot implements Elasticsearch automatic Chinese character and pinyin completion, Springboot implements automatic spelling error correction
Elasticsearch actual combat (3): Springboot implements Elasticsearch search recommendation
Elasticsearch actual combat (4) : Springboot implements Elasticsearch index aggregation and drill-down analysis
Elasticsearch actual combat (5): Springboot implements Elasticsearch e-commerce platform log embedding points and search hot words

1. Install the ik pinyin word breaker plug-in

1. Download address

Source address: https://github.com/medcl/elasticsearch-analysis-pinyin
Download address: https://github.com/medcl/elasticsearch-analysis-pinyin/releases
We use version 7.4.0 this time: https: //github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.4.0/elasticsearch-analysis-pinyin-7.4.0.zip

2. Download and install

mkdir /mydata/elasticsearch/plugins/elasticsearch-analysis-pinyin-7.4.0
cd /mydata/elasticsearch/plugins/elasticsearch-analysis-pinyin-7.4.0
# 下载
wget https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.4.0/elasticsearch-analysis-pinyin-7.4.0.zip

# 解压
unzip elasticsearch-analysis-pinyin-7.4.0.zip
rm -f elasticsearch-analysis-pinyin-7.4.0.zip
# 重启es
docker restart 558eded797f9

3. Attribute Encyclopedia

insert image description here
When we create an index, we can customize the tokenizer, and match the custom tokenizer by specifying a mapping:

{
    
    
    "indexName": "product_completion_index",
    "map": {
    
    
        "settings": {
    
    
            "number_of_shards": 1,
            "number_of_replicas": 2,
            "analysis": {
    
    
                "analyzer": {
    
    
                    "ik_pinyin_analyzer": {
    
    
                        "type": "custom",
                        "tokenizer": "ik_smart",
                        "filter": "pinyin_filter"
                    }
                },
                "filter": {
    
    
                    "pinyin_filter": {
    
    
                        "type": "pinyin",
                        "keep_first_letter": true,
                        "keep_separate_first_letter": false,
                        "keep_full_pinyin": true,
                        "keep_original": true,
                        "limit_first_letter_length": 16,
                        "lowercase": true,
                        "remove_duplicated_term": true
                    }
                }
            }
        },
        "mapping": {
    
    
            "properties": {
    
    
                "name": {
    
    
                    "type": "text"
                },
                "searchkey": {
    
    
                    "type": "completion",
                    "analyzer": "ik_pinyin_analyzer"
                }
            }
        }
    }
}

2. Custom corpus

1. Added index mapping

/*
 * @Description: 新增索引+setting+映射+自定义分词器pinyin
 * setting可以为空(自定义分词器pinyin在setting中)
 * 映射可以为空
 * @Method: addIndexAndMapping
 * @Param: [commonEntity]
 * @Return: boolean
 *
 */
public boolean addIndexAndMapping(CommonEntity commonEntity) throws Exception {
    
    
    //设置setting的map
    Map<String, Object> settingMap = new HashMap<String, Object>();
    //创建索引请求
    CreateIndexRequest request = new CreateIndexRequest(commonEntity.getIndexName());
    //获取前端参数
    Map<String, Object> map = commonEntity.getMap();
    //循环外层的settings和mapping
    for (Map.Entry<String, Object> entry : map.entrySet()) {
    
    
        if ("settings".equals(entry.getKey())) {
    
    
            if (entry.getValue() instanceof Map && ((Map) entry.getValue()).size() > 0) {
    
    
                request.settings((Map<String, Object>) entry.getValue());
            }
        }
        if ("mapping".equals(entry.getKey())) {
    
    
            if (entry.getValue() instanceof Map && ((Map) entry.getValue()).size() > 0) {
    
    
                request.mapping((Map<String, Object>) entry.getValue());
            }

        }
    }
    //创建索引操作客户端
    IndicesClient indices = client.indices();
    //创建响应对象
    CreateIndexResponse response = indices.create(request, RequestOptions.DEFAULT);
    //得到响应结果
    return response.isAcknowledged();
}

Contents of CommonEntity:
The settings under settings are index setting information, dynamically set parameters, and follow the DSL writing method.
Mapping is the mapped field information, dynamically set parameters, and follow the DSL writing method

{
    
    
    "indexName": "product_completion_index",
    "map": {
    
    
        "settings": {
    
    
            "number_of_shards": 1,
            "number_of_replicas": 2,
            "analysis": {
    
    
                "analyzer": {
    
    
                    "ik_pinyin_analyzer": {
    
    
                        "type": "custom",
                        "tokenizer": "ik_smart",
                        "filter": "pinyin_filter"
                    }
                },
                "filter": {
    
    
                    "pinyin_filter": {
    
    
                        "type": "pinyin",
                        "keep_first_letter": true,
                        "keep_separate_first_letter": false,
                        "keep_full_pinyin": true,
                        "keep_original": true,
                        "limit_first_letter_length": 16,
                        "lowercase": true,
                        "remove_duplicated_term": true
                    }
                }
            }
        },
        "mapping": {
    
    
            "properties": {
    
    
                "name": {
    
    
                    "type": "keyword"
                },
                "searchkey": {
    
    
                    "type": "completion",
                    "analyzer": "ik_pinyin_analyzer"
                }
            }
        }
    }
}

Or execute directly in kibana:

PUT product_completion_index
{
    
    
    "settings": {
    
    
        "number_of_shards": 1,
        "number_of_replicas": 2,
        "analysis": {
    
    
            "analyzer": {
    
    
                "ik_pinyin_analyzer": {
    
    
                    "type": "custom",
                    "tokenizer": "ik_smart",
                    "filter": "pinyin_filter"
                }
            },
            "filter": {
    
    
                "pinyin_filter": {
    
    
                    "type": "pinyin",
                    "keep_first_letter": true,
                    "keep_separate_first_letter": false,
                    "keep_full_pinyin": true,
                    "keep_original": true,
                    "limit_first_letter_length": 16,
                    "lowercase": true,
                    "remove_duplicated_term": true
                }
            }
        }
    },
    "mappings": {
    
    
        "properties": {
    
    
            "name": {
    
    
                "type": "keyword"
            },
            "searchkey": {
    
    
                "type": "completion",
                "analyzer": "ik_pinyin_analyzer"
            }
        }
    }
}

2. Add documents in batches

/*
 * @Description: 批量新增文档,可自动创建索引、自动创建映射
 * @Method: bulkAddDoc
 * @Param: [indexName, map]
 *
 */
public static RestStatus bulkAddDoc(CommonEntity commonEntity) throws Exception {
    
    
    //通过索引构建批量请求对象
    BulkRequest bulkRequest = new BulkRequest(commonEntity.getIndexName());
    //循环前台list文档数据
    for (int i = 0; i < commonEntity.getList().size(); i++) {
    
    
        bulkRequest.add(new IndexRequest().source(XContentType.JSON, SearchTools.mapToObjectGroup(commonEntity.getList().get(i))));
    }
    //执行批量新增
    BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
    return bulkResponse.status();
}

public static void main(String[] args) throws Exception {
    
    
	// 批量插入
    CommonEntity commonEntity = new CommonEntity();
    commonEntity.setIndexName("product_completion_index"); // 索引名
    List<Map<String, Object>> list = new ArrayList<>();
    commonEntity.setList(list);
    list.add(new CommonMap<String, Object>().putData("searchkey", "小米手机").putData("name", "小米(MI)"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "小米11").putData("name", "小米(MI)"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "小米电视").putData("name", "小米(MI)"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "小米9").putData("name", "小米(MI)"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "小米手机").putData("name", "小米(MI)"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "小米手环").putData("name", "小米(MI)"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "小米笔记本").putData("name", "小米(MI)"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "小米摄像头").putData("name", "小米(MI)"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "adidas男鞋").putData("name", "adidas男鞋"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "adidas女鞋").putData("name", "adidas女鞋"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "adidas外套").putData("name", "adidas外套"));
    list.add(new CommonMap<String, Object>().putData("searchkey", "adidas裤子").putData("name", "adidas裤子"));
    bulkAddDoc(commonEntity);
}

3. Query results

GET product_completion_index/_search

3. Product search and automatic completion of Chinese characters and pinyin

1. Concept

Term suggester : Term suggester. Segment the input text and provide word suggestions for each word.
Phrase suggester: Phrase suggester, based on terms, will consider the relationship between multiple terms.
Completion Suggester, its main application scenario is "Auto Completion".
Context Suggester: Context Suggestor.

GET product_completion_index/_search
{
    
    
    "from": 0,
    "size": 100,
    "suggest": {
    
    
        "czbk-suggest": {
    
    
            "prefix": "小米",
            "completion": {
    
    
                "field": "searchkey",
                "size": 20,
                "skip_duplicates": true
            }
        }
    }
}

2. Java realizes automatic completion of Chinese characters

/*
 * @Description: 自动补全 根据用户的输入联想到可能的词或者短语
 * @Method: suggester
 * @Param: [commonEntity]
 * @Update:
 * @since: 1.0.0
 * @Return: org.elasticsearch.action.search.SearchResponse
 * >>>>>>>>>>>>编写思路简短总结>>>>>>>>>>>>>
 * 1、定义远程查询
 * 2、定义查询请求(评分排序)
 * 3、定义自动完成构建器(设置前台建议参数)
 * 4、将自动完成构建器加入到查询构建器
 * 5、将查询构建器加入到查询请求
 * 6、获取自动建议的值(数据结构处理)
 */
public static List<String> cSuggest(CommonEntity commonEntity) throws Exception {
    
    

    //定义返回
    List<String> suggestList = new ArrayList<>();
    //构建查询请求
    SearchRequest searchRequest = new SearchRequest(commonEntity.getIndexName());
    //通过查询构建器定义评分排序
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC));
    //构造搜索建议语句,搜索条件字段
    CompletionSuggestionBuilder completionSuggestionBuilder =new CompletionSuggestionBuilder(commonEntity.getSuggestFileld());
    //搜索关键字
    completionSuggestionBuilder.prefix(commonEntity.getSuggestValue());
    //去除重复
    completionSuggestionBuilder.skipDuplicates(true);
    //匹配数量
    completionSuggestionBuilder.size(commonEntity.getSuggestCount());
    searchSourceBuilder.suggest(new SuggestBuilder().addSuggestion("common-suggest", completionSuggestionBuilder));
    //common-suggest为返回的字段,所有返回将在common-suggest里面,可写死,sort按照评分排序
    searchRequest.source(searchSourceBuilder);
    //定义查找响应
    SearchResponse suggestResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    //定义完成建议对象
    CompletionSuggestion completionSuggestion = suggestResponse.getSuggest().getSuggestion("common-suggest");
    List<CompletionSuggestion.Entry.Option> optionsList = completionSuggestion.getEntries().get(0).getOptions();
    //从optionsList取出结果
    if (!CollectionUtils.isEmpty(optionsList)) {
    
    
        optionsList.forEach(item -> suggestList.add(item.getText().toString()));
    }
    return suggestList;
}

public static void main(String[] args) throws Exception {
    
    

    // 自动补全
    CommonEntity suggestEntity = new CommonEntity();
    suggestEntity.setIndexName("product_completion_index"); // 索引名
    suggestEntity.setSuggestFileld("searchkey"); // 自动补全查找列
    suggestEntity.setSuggestValue("小米"); //  自动补全输入的关键字
    suggestEntity.setSuggestCount(5); // 自动补全返回个数

    System.out.println(cSuggest(suggestEntity));
    // 结果:[小米11, 小米9, 小米手机, 小米手环, 小米摄像头]
    // 自动补全自动去重

}

3. Java implements automatic pinyin completion

// (1)自动补全 :全拼访问
CommonEntity suggestEntity = new CommonEntity();
suggestEntity.setIndexName("product_completion_index"); // 索引名
suggestEntity.setSuggestFileld("searchkey"); // 自动补全查找列
suggestEntity.setSuggestValue("xiaomi"); //  自动补全输入的关键字
suggestEntity.setSuggestCount(5); // 自动补全返回个数
System.out.println(cSuggest(suggestEntity));
// 结果:[小米11, 小米9, 小米摄像头, 小米电视, 小米笔记本]

// (2)自动补全 :全拼访问(分隔)
CommonEntity suggestEntity = new CommonEntity();
suggestEntity.setIndexName("product_completion_index"); // 索引名
suggestEntity.setSuggestFileld("searchkey"); // 自动补全查找列
suggestEntity.setSuggestValue("xiao mi"); //  自动补全输入的关键字
suggestEntity.setSuggestCount(5); // 自动补全返回个数
System.out.println(cSuggest(suggestEntity));
// 结果:[小米11, 小米9, 小米摄像头, 小米电视, 小米笔记本]

// (3)自动补全 :首字母访问
CommonEntity suggestEntity = new CommonEntity();
suggestEntity.setIndexName("product_completion_index"); // 索引名
suggestEntity.setSuggestFileld("searchkey"); // 自动补全查找列
suggestEntity.setSuggestValue("xm"); //  自动补全输入的关键字
suggestEntity.setSuggestCount(5); // 自动补全返回个数
System.out.println(cSuggest(suggestEntity));
// 结果:[小米11, 小米9, 小米摄像头, 小米电视, 小米笔记本]

4. Language processing (spelling error correction)

1. Example

GET product_completion_index/_search
{
    
    
    "suggest": {
    
    
        "common-suggestion": {
    
    
            "text": "adidaas男鞋",
            "phrase": {
    
    
                "field": "name",
                "size": 13
            }
        }
    }
}

insert image description here

2. Java implements spelling error correction

/*
 * @Description: 拼写纠错
 * @Method: psuggest
 * @Param: [commonEntity]
 * @Update:
 * @since: 1.0.0
 * @Return: java.util.List<java.lang.String>
 * >>>>>>>>>>>>编写思路简短总结>>>>>>>>>>>>>
 * 1、定义远程查询
 * 2、定义查询请求(评分排序)
 * 3、定义自动纠错构建器(设置前台建议参数)
 * 4、将拼写纠错构建器加入到查询构建器
 * 5、将查询构建器加入到查询请求
 * 6、获取拼写纠错的值(数据结构处理)
 */
public static String pSuggest(CommonEntity commonEntity) throws Exception {
    
    
    //定义返回
    String pSuggestString = new String();
    //定义查询请求
    SearchRequest searchRequest = new SearchRequest(commonEntity.getIndexName());
    //定义查询条件构建器
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    //定义排序器
    searchSourceBuilder.sort(new ScoreSortBuilder().order(SortOrder.DESC));
    //构造短语建议器对象(参数为匹配列)
    PhraseSuggestionBuilder pSuggestionBuilder = new PhraseSuggestionBuilder(commonEntity.getSuggestFileld());
    //搜索关键字(被纠错的值)
    pSuggestionBuilder.text(commonEntity.getSuggestValue());
    //匹配数量
    pSuggestionBuilder.size(1);
    searchSourceBuilder.suggest(new SuggestBuilder().addSuggestion("common-suggest", pSuggestionBuilder));
    searchRequest.source(searchSourceBuilder);
    //定义查找响应
    SearchResponse suggestResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    //定义短语建议对象
    PhraseSuggestion phraseSuggestion = suggestResponse.getSuggest().getSuggestion("common-suggest");
    //获取返回数据
    List<PhraseSuggestion.Entry.Option> optionsList = phraseSuggestion.getEntries().get(0).getOptions();
    //从optionsList取出结果
    if (!CollectionUtils.isEmpty(optionsList) &&optionsList.get(0).getText()!=null) {
    
    
        pSuggestString = optionsList.get(0).getText().string().replaceAll(" ","");
    }
    return pSuggestString;
}


public static void main(String[] args) throws Exception {
    
    

    CommonEntity suggestEntity = new CommonEntity();
    suggestEntity.setIndexName("product_completion_index"); // 索引名
    suggestEntity.setSuggestFileld("name"); // 自动补全查找列
    suggestEntity.setSuggestValue("adidaas男鞋"); //  自动补全输入的关键字
    System.out.println(pSuggest(suggestEntity)); // 结果:adidas男鞋
}

V. Summary

  1. Need a search thesaurus/corpus, not together with the business index library, easy to maintain and upgrade the corpus
  2. According to word segmentation and other search conditions, query several records in the corpus (13 records from Jingdong, 10 records from Taobao (Tmall), 4 records from Baidu) and return the records
  3. In order to improve the accuracy, it is usually a prefix search

Guess you like

Origin blog.csdn.net/A_art_xiang/article/details/132259599