Elasticsearch之分词器查询分词效果

本文链接： https://blog.csdn.net/yexiaomodemo/article/details/97933870

0、引言

Elasticsearch之分词器中文的我们一般使用IK，如果没有指定分词器。默认使用的是standard分词。

IK分词能将中文分成词组：

standard分词则会将每个中文分成一个单个的词：

其他分词器：ansj_index ......

优劣：IK 分词能够根据词库创建有效的分词索引，搜索的效率和准确率很高。劣势：有小部分词如果不存在词库，则不会被分词，因此在查询的时候，可能目标文本存在此词，但是分词分不出这词，则查询结果将为空。

优劣：standard分词是根据每个汉字进行的分词，因此优势就是，IK存在的问题，将不会出现在standard分词上面，但是劣势更加明显，第一：搜索引擎的文本样例基本都是千万级，亿级数据，每字均创建索引，索引区会很大。第二：查询结果如果是match则会出现非常大的干扰型数据，且相关性的排序会不理想。特别是完全匹配跟部分匹配的排序非常糟糕（可能可以通过查询优化）。

其他：都存在某部分优劣，本文档就不全部讲述了

1、使用分词器 6.0.0以下

如果你使用的是其他分词器的话，你将ansj_index 换成你的standard 或者ik_max_word 等即可。

GET请求即可

http://localhost:9200/category/_analyze?analyzer=ansj_index&text=测试用例

结果如下：

{
    "tokens": [
        {
            "token": "测试用例",
            "start_offset": 0,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "测试",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 2
        },
        {
            "token": "试用",
            "start_offset": 1,
            "end_offset": 3,
            "type": "word",
            "position": 3
        }
    ]
}

2、查询分词器6.0.0及上

POST请求：下面是JSON数据

http://localhost:9200/_analyze/?pretty

{ "analyzer": "ik_max_word", "text": "测试用例" }

{
    "tokens": [
        {
            "token": "测试",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "试用",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "例",
            "start_offset": 3,
            "end_offset": 4,
            "type": "CN_CHAR",
            "position": 2
        }
    ]
}

上面是正常情况下面的使用方式。

3、6.0.0以上使用老的查询方式

http://www.zcsjw.com/es2/category/_analyze?analyzer=ansj_index&text=测试用例

报错：
{
	"error": {
		"root_cause": [{
			"type": "parse_exception",
			"reason": "request body or source parameter is required"
		}],
		"type": "parse_exception",
		"reason": "request body or source parameter is required"
	},
	"status": 400
}

处理方式是你用6.0.0的方式即可。

部分信息参考来源：

elasticsearch6查看分词器效果

Elasticsearch 默认分词器和中分分词器

ElasticSearch6 安装中文分词器