【elasticsearch】Term vector笔记 - 代码天地

【elasticsearch】Term vector笔记

其他 2018-05-08 10:57:10 阅读次数: 4

Term vecotr：获取document中某个field内各个term的统计信息

index-time: mapping配置、建立索引时生成term和field信息

query-time：查看term vector时现场计算统计信息，返回

term information: term frequency in the field

term positions: start and end offsets

term statistics: 设置term_statistics=true;

total term frequency, tfc 一个term在所有document中出现的频率

document frequency，有多少document包含这个term

field statistics: document count，有多少document包含这个field

sum of document frequency，一个field中所有term的df之和;

sum of total term frequency，一个field中的所有term的tf之和

index-time

PUT /my_index

{

"mappings": {

"my_type": {

"properties": {

"text": {

"type": "text",

"term_vector": "with_positions_offsets_payloads",

"store" : true,

"analyzer" : "fulltext_analyzer"

},

"fullname": {

"type": "text",

"analyzer" : "fulltext_analyzer"

}

}

}

},

"settings" : {

"index" : {

"number_of_shards" : 1,

"number_of_replicas" : 0

},

"analysis": {

"analyzer": {

"fulltext_analyzer": {

"type": "custom",

"tokenizer": "whitespace",

"filter": [

"lowercase",

"type_as_payload"

]

}

}

}

}

}

GET /my_index/my_type/1/_termvectors

{

"fields" : ["text"],

"offsets" : true,

"payloads" : true,

"positions" : true,

"term_statistics" : true,

"field_statistics" : true

}

一个term出现一次就是一个token 出现的位置start_offset

手动指定doc的term vector

4、GET /my_index/my_type/_termvectors

{

"doc" : {

"fullname" : "Leo Li",

"text" : "hello test test test"

},

"fields" : ["text"],

"offsets" : true,

"payloads" : true,

"positions" : true,

"term_statistics" : true,

"field_statistics" : true

}

手动指定一个doc，如上的“text”，

将term分词，对每个term，计算它在先有的doc中的统计信息

4、中最后的}上添加分词器

"per_field_analyzer" : {

"text": "standard"

}

GET /my_index/my_type/_termvectors

{

……

"filter" : {

"max_num_terms" : 3,最多terms个数

"min_term_freq" : 1,最少term频率

"min_doc_freq" : 1最少doc中出现次数

}

}

4、中添加terms filter；

根据term统计信息，过滤出你想要看到的term vector统计结果

滤掉一些出现频率过低的term

multi term vector

GET _mtermvectors

{

"docs": [

{

"_index": "my_index",

"_type": "my_type",

"_id": "2",

"term_statistics": true

},

{

"_index": "my_index",

"_type": "my_type",

"_id": "1",

"fields": [

"text"

]

}

]

}

GET /my_index/_mtermvectors

{

"docs": [

{

"_type": "test",

"_id": "2",

"fields": [

"text"

],

"term_statistics": true

},

{

"_type": "test",

"_id": "1"

}

]

}

GET /my_index/my_type/_mtermvectors

{

"docs": [

{

"_id": "2",

"fields": [

"text"

],

"term_statistics": true

},

{

"_id": "1"

}

]

}

GET /_mtermvectors

{

"docs": [

{

"_index": "my_index",

"_type": "my_type",

"doc" : {

"fullname" : "Leo Li",

"text" : "hello test test test"

}

},

{

"_index": "my_index",

"_type": "my_type",

"doc" : {

"fullname" : "Leo Li",

"text" : "other hello test ..."

}

}

]

}

猜你喜欢

转载自my.oschina.net/u/3655192/blog/1785964

【elasticsearch】Term vector笔记

elasticsearch(22) es中的term vector

Elasticsearch学习笔记之—term查询

ElasticSearch教程——基于term vector深入探查数据的情况

Java学习教程：Elasticsearch系列---Term Vector工具探查数据

Elasticsearch核心技术与实战学习笔记 34 | Term&Phrase Suggester

ElasticSearch immense term错误

elasticsearch ：term与match区别

ElasticSearch - match vs term

Elasticsearch 基本查询，term，match，

elasticsearch 查询（match和term）

elasticsearch 中文 term & completion suggester

ElasticSearch - term 和 match 的差别

elasticsearch的term query与match query区别

Elasticsearch查询match、term和bool区别

elasticsearch-mathc和term的区分

ElasticSearch match, match_phrase, term区别

Elasticsearch Query DSL之Term level queries

kibana操作elasticsearch：词条匹配(term)

Elasticsearch (DSL搜索 - term/match terms)

ElasticSearch term&terms&match查询

ElasticSearch-DSL搜索 term与match

Elasticsearch中的Term查询和全文查询

Elasticsearch学习系列之term和match查询

ElasticSearch 关于term搜索不到指定数据的问题

白话Elasticsearch01- 使用term query来搜索数据

Elasticsearch --- 4. term与match ,修改器,建议器

ElasticSearch7.2之term的多种查询(精确查询)

Elasticsearch：运用 shard_size 来提高term aggregation的精度

elasticsearch基本查询二（英文分词）term和terms查询

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

NEFU 117 素数个数的位数

Closest Common Ancestors (Lca,tarjan)

ELK部署

【转载】Hive笔记整理（三）

SQL语句（一）基本表的定义

关于Java web开发中的MySQL的事务语句

MFC创建自定义窗体

如何用一句话激怒程序员？

《逆袭大学》文摘——9.4 基础和应用的平衡中找到大学的节奏

【spring源码分析】@Value注解原理

每日归档

更多

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)