第61节：索引管理_快速上机动手实战修改分词器以及定制自己的分词器 - 代码天地

第61节：索引管理_快速上机动手实战修改分词器以及定制自己的分词器

其他 2019-03-06 10:10:56 阅读次数: 0

课程大纲

1、默认的分词器

standard

standard tokenizer：以单词边界进行切分

standard token filter：什么都不做

lowercase token filter：将所有字母转换为小写

stop token filer（默认被禁用）：移除停用词，比如a the it等等

2、修改分词器的设置

扫描二维码关注公众号，回复： 5435058 查看本文章

启用english停用词token filter

PUT /my_index

{

"settings": {

"analysis": {

"analyzer": {

"es_std": {

"type": "standard",

"stopwords": "_english_"

}

}

}

}

}

GET /my_index/_analyze

{

"analyzer": "standard",

"text": "a dog is in the house"

}

GET /my_index/_analyze

{

"analyzer": "es_std",

"text":"a dog is in the house"

}

3、定制化自己的分词器

PUT /my_index

{

"settings": {

"analysis": {

"char_filter": {

"&_to_and": {

"type": "mapping",

"mappings": ["&=> and"]

}

},

"filter": {

"my_stopwords": {

"type": "stop",

"stopwords": ["the", "a"]

}

},

"analyzer": {

"my_analyzer": {

"type": "custom",

"char_filter": ["html_strip", "&_to_and"],

"tokenizer": "standard",

"filter": ["lowercase", "my_stopwords"]

}

}

}

}

}

GET /my_index/_analyze

{

"text": "tom&jerry are a friend in the house, <a>, HAHA!!",

"analyzer": "my_analyzer"

}

PUT /my_index/_mapping/my_type

{

"properties": {

"content": {

"type": "text",

"analyzer": "my_analyzer"

}

}

}

猜你喜欢

转载自blog.csdn.net/qq_35524586/article/details/88170042

第61节：索引管理_快速上机动手实战修改分词器以及定制自己的分词器

ElasticSearch最佳入门实践（六十一）修改分词器以及定制自己的分词器

ElasticSearch（二十六）修改分词器及定制自己的分词器

第39节：初识搜索引擎_分词器的内部组成到底是什么，以及内置分词器的介绍

倒排索引与分词器

Elasticsearch修改分词器以及自定义分词器

自己动手制作elasticsearch的ik分词器的Docker镜像

Elasticsearch技术解析与实战-索引分词器

IKAnalyzer分词器

Lucene分词器

分词器

Ik分词器

elasticsearch 分词器

NLTK的分词器

中文分词器

分词器(Tokenizer)

Analyzer分词器

es分词器

Elasticsearch分词器

Elasticsearch 入门索引、分词器

搜索引擎系列三：Lucene分词器详解、实现自己的一个分词器

jieba分词器详解及python实战

docker安装Elasticsearch以及分词器

elasticsearch 分词器器读写分词

ElasticSearch IK分词器的安装与使用IK分词器创建索引

Lucene系列三：Lucene分词器详解、实现自己的一个分词器

Elasticsearch(10) --- 内置分词器、中文分词器

ElasticSearch中文分词器-IK分词器的使用

Elasticsearch分词器-中文分词器ik

自然语言处理之中文分词器－jieba分词器详解及python实战

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

基本数据类型封装类比较 Java源码解读(一) 8种基本类型对应的封装类型

JS实现无缝滚动上

深入解析HashMap原理（基于JDK1.8）

mysql的连接池

关于.htc

linux下的ubuntu12.04图形界面

【数论】好推不好记的扩展欧几里德

设备树详解

cscope + tags 简单设置

xml学习

每日归档

更多

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)