elasticsearch 中文+拼音搜索

需求

雪花啤酒  需要搜索雪花、啤酒 、雪花啤酒、xh、pj、xh啤酒、雪花pj

ik导入

参考https://www.cnblogs.com/LQBlog/p/10443862.html,不需要修改源码步骤就行

拼音分词器导入

跟ik一样 下载下来打包移动到es plugins 目录名字改为pinyin

测试

get请求:http://127.0.0.1:9200/_analyze

body:

{
"analyzer":"pinyin",
"text":"雪花啤酒"
}

响应:

{
    "tokens": [
        {
            "token": "xue",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 0
        },
        {
            "token": "xhpj",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 0
        },
        {
            "token": "hua",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 1
        },
        {
            "token": "pi",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 2
        },
        {
            "token": "jiu",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 3
        }
    ]
}

说明导入成功

测试中文加拼音搜索

自定义mapping和自定义分词器

put请求:http://127.0.0.1:9200/opcm3

body:

{
    "settings": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {//自定义一个分词器名字叫ik_pinyin_analyzer
                    "type":"custom",//表示自定义分词器
                    "tokenizer": "ik_smart",//使用ik分词 ik_smart为粗粒度分词 ik_max_word为最细粒度分词
                    "filter": ["my_pinyin"]//分词后结果 交给过滤器再次分词
                }
            },
            "filter": {
                "my_pinyin": {//定义一个过滤器分词 内部使用pinyin
                    "type": "pinyin"
                }
            }
        }
    },
    "mappings" : {//自定义映射
        "topic" : {//type
            "properties" : {
                "productName": {//属性
                    "type": "text",
                    "analyzer": "ik_pinyin_analyzer"//使用自定义分词
                }
            }
        }
    }
}

filter个人理解

我的理解是   ik分词 然后将分词后的逐项结果通过filter交给拼音分词  雪花啤酒 ik会分成 雪花,啤酒    然后雪花交给pinyin会分词 xue,hua,xh  啤酒会分词 pi,jiu,pj 

测试

put请求:http://127.0.0.1:9200/opcm3/topic/1

body:

{
    "productName":"雪花啤酒"
}

查看这条数据分词结果

get请求:http://127.0.0.1:9200/opcm3/topic/1/_termvectors?fields=productName

结果:

{
    "_index": "opcm3",
    "_type": "topic",
    "_id": "1",
    "_version": 1,
    "found": true,
    "took": 40,
    "term_vectors": {
        "productName": {
            "field_statistics": {
                "sum_doc_freq": 6,
                "doc_count": 1,
                "sum_ttf": 6
            },
            "terms": {
                "hua": {
                    "term_freq": 1,
                    "tokens": [
                        {
                            "position": 1,
                            "start_offset": 0,
                            "end_offset": 2
                        }
                    ]
                },
                "jiu": {
                    "term_freq": 1,
                    "tokens": [
                        {
                            "position": 3,
                            "start_offset": 2,
                            "end_offset": 4
                        }
                    ]
                },
                "pi": {
                    "term_freq": 1,
                    "tokens": [
                        {
                            "position": 2,
                            "start_offset": 2,
                            "end_offset": 4
                        }
                    ]
                },
                "pj": {
                    "term_freq": 1,
                    "tokens": [
                        {
                            "position": 3,
                            "start_offset": 2,
                            "end_offset": 4
                        }
                    ]
                },
                "xh": {
                    "term_freq": 1,
                    "tokens": [
                        {
                            "position": 0,
                            "start_offset": 0,
                            "end_offset": 2
                        }
                    ]
                },
                "xue": {
                    "term_freq": 1,
                    "tokens": [
                        {
                            "position": 0,
                            "start_offset": 0,
                            "end_offset": 2
                        }
                    ]
                }
            }
        }
    }
}

get请求:http://127.0.0.1:9200/opcm3/topic/_search

{
    "query":{
        "match":{
            "productName":"雪花啤"
        }
    }
}

这个时候我们搜索xh啤酒  雪花pj xh 等 都能搜索到数据

猜你喜欢

转载自www.cnblogs.com/LQBlog/p/10449637.html