以下执行命令都是基于阿里es提供的kibana。
前言:
以前在服务器上直接使用curl命令就可以进行es的查询,后来公司用了阿里的es后,在阿里给的服务器上执行命令居然会报错
[root@Alihui ~]# curl -XGET es-cn-huiiiiiiiiiiiii.elasticsearch.aliyuncs.com:9200
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}[root@Ali98 ~]# timed out waiting for input: auto-logout
解决:原来是人家阿里做了相应的控制,需要输入用户和密码按人家的套路才能访问,详情请看https://help.aliyun.com/document_detail/57877.html?spm=a2c4g.11186623.6.548.AAW08d
正确的连接姿势:
[root@Ali98 ~]# curl -u hui:hui -XGET es-cn-huiiiiiiiiiiiii.elasticsearch.aliyuncs.com:9200
{
"name" : "huihui",
"cluster_name" : "es-cn-huiiiiiiiiiiiii",
"cluster_uuid" : "huiiiiiiiiiiiii_iiiii",
"version" : {
"number" : "5.5.3",
"build_hash" : "930huihui",
"build_date" : "2017-09-07T15:56:59.599Z",
"build_snapshot" : false,
"lucene_version" : "6.6.0"
},
"tagline" : "You Know, for Search"
}
查看该服务器所有的索引信息:
GET _cat/indices?v
获取索引的mapping:
GET /xiao-2018-6-12/Socials/_mapping
增加:
1.增加指定字段name的值为xiaoqiang:
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"script" : "ctx._source.name = \"xiaoqiang\""
}
删除:
1.删除指定字段:
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"script" : "ctx._source.remove(\"name_of_new_field\")"
}
2.删除一条数据:
DELETE mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp?routing=news
3.根据多个条件批量删除:
POST mei_toutiao/News/_delete_by_query?routing=news
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "term" : { "mediaNameZh" : "5time悦读" } },
{ "term" : { "codeName" : "美发" } }
]
}
}
}
}
}
更新:
1.局部更新:
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"doc" : {
"userName": "hao" //有这个字段则修改,没有则增加该字段
}
}
2.更新字符串数组:
POST mei_toutiao/News/AWPN8pLjs4TGXdjfL8_b/_update?routing=news
{
"doc" : {
"littleUrls": [
"http://shishanghui.oss-cn-beijing.aliyuncs.com/700d2d2936f40fabe5a70b1449f07f9df080.jpg?x-oss-process=image/format,jpg/interlace,1",
"http://shishanghui.oss-cn-beijing.aliyuncs.com/ed7ad5d1e23441880c59abf0cfd7a89df080.jpg?x-oss-process=image/format,jpg/interlace,1"
]
}
}
3.全部更新:
(不管有没有下面这些字段,都变为只有下面这些内容即全部替换掉下面的,所以慎用!!!)
PUT mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp?routing=news
{
"counter" : 1,
"tags" : ["red"]
}
4.批量重置评论量大于0的文章的评论量为0:
POST mei_toutiao/News/_update_by_query?routing=news
{
"query": {
"bool": {
"must": [
{
"range": {
"atdCnt": {
"gt": 0
}
}
}
]
}
},
"script": {
"inline":"ctx._source.atdCnt = 0"
}
}
可参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html
5.批量增加相应字段并赋值:
POST hui/News/_update_by_query
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"hui": "hehe"
}
}
}
}
}
},
"script": {
"inline":"ctx._source.name = \"xiaoqiang\""
}
}
6.使用脚本更新:
当文档存在时,把文档的counter字段设置为3;当文档不存在时,插入一个新的文档,文档的counter字段的值是2
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"script":{
"inline":"ctx._source.counter = 3"
},
"upsert":{"counter":2}
}
counter字段加4:
参考(版本号是6.4,官方文档用的是”source”,我的阿里es是5.5.3,用”inline”才好使):https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html
POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
"script" : {
"inline": "ctx._source.counter += 4"
}
}
或者:
{
"script" : {
"inline": "ctx._source.counter += params.count",
"lang": "painless",
"params" : {
"count" : 4
}
}
}
搜索:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"_id": "AWNcz4IrB-kQcwLDJ93q"
}
}
}
}
}
注:
1.”constant_score”的用处参考https://blog.csdn.net/dm_vincent/article/details/42157577
2.match和term的区别可参考https://www.cnblogs.com/yjf512/p/4897294.html
3.term里面也可以是数据相对应的字段(如”newType” : 1),根据字段查可能会返回很多条数据,但是根据_id查只会返回一条数据。
1.搜索一条数据:
GET mei_toutiao/hui/AWNcz4IrB-kQcwLDJ93q?routing=hui
2.搜索全部数据:
GET mei_toutiao/_search
注:可以全部搜索到,但是默认返回10条数据
3.搜索所有newType字段为1的数据:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"newType": "1"
}
}
}
}
}
}
}
搜索所有newType字段不为1的数据:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must_not" : {
"term" : {
"newType": "1"
}
}
}
}
}
}
}
注意:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"match_phrase" : {
"userId": "1C210E82-21B7-4220-B267-ED3DA6635F6F"
}
}
}
}
}
}
}
上面可以查到相应的数据,而下面却不行
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"userId": "1C210E82-21B7-4220-B267-ED3DA6635F6F"
}
}
}
}
}
}
}
4.存在该字段的文档:
GET mei_toutiao/_search
{
"query":{
"exists": {
"field": "newType"
}
}
}
不存在该字段的文档:
GET mei_toutiao/_search
{
"query":{
"bool": {
"must_not": {
"exists": {
"field": "newType"
}
}
}
}
}
5.多字段查询:
GET mei_toutiao/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "term" : { "sourceType" : "FORUM" } },
{ "term" : { "flwCnt" : 0 } }
]
}
}
}
}
}
6.按pubTime字段降序:升序是asc
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"newType": "1"
}
}
}
}
}
}
, "sort": [
{
"pubTime": "desc"
}
]
}
7.视频分类中过滤掉抖音:
GET mei_toutiao/_search
{
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"codeName": "视频"
}
},
"must_not" : {
"term" : {
"mediaNameZh": "抖音"
}
}
}
}
}
}
, "sort": [
{
"pubTime": "desc"
}
]
}
对应的java api:
query.must(QueryBuilders.termsQuery("codeName", "视频"))
.mustNot(QueryBuilders.matchQuery("mediaNameZh", "抖音"));
client.setQuery(fqb).addSort("pubTime", SortOrder.DESC);
分页加排序:
client.setQuery(fqb).setFrom((message.getInt("pageNo")-1)*10).setSize(10).addSort("pubTime", SortOrder.DESC);
8.根据时间范围搜索:
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
GET mei_toutiao/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"pubDay": {
"gte": "2018-05-11",
"lte": "2018-05-12"
}
}
}
]
}
}
}
昨天到现在:
GET mei_toutiao/_search
{
"query": {
"range" : {
"pubDay" : {
"gte" : "now-1d/d",
"lt" : "now/d"
}
}
}
}
按相应的时间格式查询:
GET mei_toutiao/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"pubDay": {
"gte": "2018-05-29 00:00:00",
"lte": "2018-05-30 00:00:00",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
]
}
}
}
或者:
GET mei_toutiao/_search
{
"query": {
"range" : {
"pubDay" : {
"gte": "30/05/2018",
"lte": "2019",
"format": "dd/MM/yyyy||yyyy"
}
}
}
}
对应的java api:
QueryBuilder fqb = QueryBuilders.boolQuery().filter(new RangeQueryBuilder("pubDay").gte("2018-05-29 12:00:00").lte("2018-05-30 00:00:00").format("yyyy-MM-dd HH:mm:ss")).filter(filterQuery(message));
聚合统计:
GET xiao-2018-4-1/Socials/_search
{
"size" : 0, //取出前几条数据
"query" : { //可以先使用query查询得到需要的数据集
"term" : {
"website" : "微信"
}
},
"aggs" : {
"single_sum": { //这个名字随便起
"sum" : { "field" : "flwCnt" } //这个必须是number类型字段,flwCnt字段为关注量
}
}
}
注意:在执行上面命令的时候遇到了illegal_argument_exception报错,报错信息如下
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [website] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "xiao-2018-4-1",
"node": "Vux5eT5mTg2iiiiiiiiiii",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [website] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
]
},
"status": 400
}
解决:在这个website字段后面加.keyword就可以了。
原因:原来这个website字段是text类型,可参考https://www.cnblogs.com/duanxuan/p/6566744.html和https://segmentfault.com/a/1190000008897731
1.分类聚合:
GET mei_toutiao/_search
{
"size" : 0,
"aggs" : {
"per_count" : {
"terms" : {
"size" : 22, //不加这个默认只会返回10条数据
"field" : "codeName"
}
}
}
}
结果:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 52766,
"max_score": 0,
"hits": []
},
"aggregations": {
"per_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "视频",
"doc_count": 17258
},
{
"key": "旅游",
"doc_count": 10132
},
{
"key": "娱乐",
"doc_count": 8867
},
{
"key": "健康",
"doc_count": 4247
},
{
"key": "情感",
"doc_count": 2932
},
{
"key": "星座",
"doc_count": 2281
},
{
"key": "整形",
"doc_count": 2150
},
{
"key": "美容",
"doc_count": 2012
},
{
"key": "亲子",
"doc_count": 861
},
{
"key": "国学",
"doc_count": 444
},
{
"key": "艺术",
"doc_count": 442
},
{
"key": "搭配",
"doc_count": 393
}
]
}
}
}
注:可参考官网https://www.elastic.co/guide/cn/elasticsearch/guide/current/cardinality.html
2.sourceType字段为论坛的媒体名称聚合:
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"sourceType" : "FORUM"
}
}
}
}
}
},
"aggs" : {
"per_count" : {
"terms" : {
"size" : 10000,
"field" : "website.keyword"
}
}
}
}
3.根据name字段聚合,并且得出每个分类下的最大阅读量:
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"aggs" : {
"per_count" : {
"terms" : {
"size" : 10000,
"field" : "name"
},
"aggs" : {
"max_count" : {
"max" : {
"field" : "view"
}
}
}
}
}
}
4.查询平媒最近每天的日更量+有多少数据源(聚合结果去重排序):
GET xiao-2018-4-1/News/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{
"term" : {
"mediaTname": "平媒"
}
},
{
"range": {
"pubDay": {
"gt": "2018-08-31",
"lt": "2018-09-09"
}
}
}
]
}
}
}
},
"aggs" : {
"all_interests" : {
"terms" : {
"field" : "pubDay",
"order" : { "distinct_mediaNameZh" : "desc" }
},
"aggs" : {
"distinct_mediaNameZh" : {
"cardinality" : {
"field" : "mediaNameZh"
}
}
}
}
}
}
注:
1.根据查询到文档数量排序
"order" : { "_count" : "desc" }
2.根据聚合字段排序(让结果按pubDay字段排序,该字段类似”2018-08-24”)
"order" : { "_term" : "desc" }
3.根据子聚合结果排序
"order" : { "distinct_mediaNameZh" : "desc" }
5.sourceType字段为论坛的媒体名称聚合:
(并且每个媒体名称取出一个文章的url链接)
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"sourceType" : "FORUM"
}
}
}
}
}
},
"aggs" : {
"all_interests" : {
"terms" : {
"size" : 10000, //这个语句是没有问题,但是这么大的量扛不住(嵌套聚合导致处理的数据量指数型爆炸增加),总是报连接超时
"field" : "website.keyword"
},
"aggs" : {
"per_count" : { //这个字段名字随意取
"terms" : {
"size" : 1,
"field" : "url"
}
}
}
}
}
}
解决上面的性能问题(转换思路):
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"sourceType" : "FORUM"
}
}
}
}
}
},
"aggs" : {
"all_interests" : {
"terms" : {
"size" : 10000,
"field" : "website.keyword"
},
"aggs": {
"top_age": {
"top_hits": {
"_source": {
"includes": [
"url"
]
},
"size": 1
}
}
}
}
}
}
全局桶:
GET xiao-2018-4-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"term" : {
"sourceType" : "FORUM"
}
}
}
}
}
},
"aggs" : {
"per_count": {
"terms" : { "field" : "website.keyword" }
},
"all": {
"global" : {},
"aggs" : {
"per_count": {
"terms" : { "field" : "website.keyword" }
}
}
}
}
}
可参考:https://www.elastic.co/guide/cn/elasticsearch/guide/current/_scoping_aggregations.html
合并查询语句:
{
"bool": {
"must": { "match": { "email": "business opportunity" }},
"should": [
{ "match": { "starred": true }},
{ "bool": {
"must": { "match": { "folder": "inbox" }},
"must_not": { "match": { "spam": true }}
}}
],
"minimum_should_match": 1
}
}
注:上面这个语句逻辑比较复杂需要好好思考一下(找出信件正文包含business opportunity的星标邮件,或者在收件箱正文包含business opportunity的非垃圾邮件),该列子来自官网https://www.elastic.co/guide/cn/elasticsearch/guide/current/query-dsl-intro.html
返回指定的字段:
1.store:返回有newType字段数据的codeName和view的内容
GET mei_toutiao/_search
{
"stored_fields" : ["codeName", "view"],
"query":{
"exists": {
"field": "newType"
}
}
}
SearchRequestBuilder request = getTransportClient().prepareSearch(esProperties.getES_Index()).setTypes(type)
.setRouting(routing).storedFields(new String[] {"titleZh", "uuid"});
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-stored-fields.html
前提:mapping中相应的字段store参数为true
(参考https://blog.csdn.net/napoay/article/details/73100110?locationNum=9&fps=1#323-store)默认情况下,自动是被索引的也可以搜索,但是不存储,这也没关系,因为_source字段里面保存了一份原始文档。在某些情况下,store参数有意义,比如一个文档里面有title、date和超大的content字段,如果只想获取title和date,可以这样:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "text",
"store": true
},
"date": {
"type": "date",
"store": true
},
"content": {
"type": "text"
}
}
}
}
}
PUT my_index/my_type/1
{
"title": "Some short title",
"date": "2015-01-01",
"content": "A very long content field..."
}
GET my_index/_search
{
"stored_fields": [ "title", "date" ]
}
查询结果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 1,
"fields": {
"date": [
"2015-01-01T00:00:00.000Z"
],
"title": [
"Some short title"
]
}
}
]
}
}
Stored fields返回的总是数组,如果想返回原始字段,还是要从_source中取。
注意:在java代码中需要将field放在数组中,否则只会返回数组中的第一个
JSONObject hitJson = JSONObject.fromObject(hit.getFields());
String[] fields = [ "keywordsZh", "littleUrls"];
for (Object field : fields) {
if (hit.getFields().containsKey(field)) {
if (field.equals("keywordsZh")) {
@SuppressWarnings("unchecked")
List<String> keywordsZh = (List<String>) hitJson.getJSONObject(field.toString()).get("values");
json.put(field, keywordsZh);
// json.put(field, hitJson.getJSONObject(field.toString()).get("value")); // 只返回该数组的第一个值
}
}
}
2.返回一个指定的字段:
GET mei_toutiao/_search
{
"_source": "newType",
"query":{
"term": {
"uuid": "b6a0d42731c94db1a75383c192b5544a"
}
}
}
或者:
GET mei_toutiao/_search
{
"_source": {
"includes": "newType"
},
"query":{
"term": {
"uuid": "b6a0d42731c94db1a75383c192b5544a"
}
}
}
3.只返回newType和keywordsZh字段:
GET mei_toutiao/_search
{
"_source": [ "newType", "keywordsZh" ]
}
或者:
GET mei_toutiao/_search
{
"_source": {
"includes": [ "newType", "keywordsZh" ]
}
}
4.返回字段前缀名为t的:
GET mei_toutiao/_search
{
"_source": "t*"
}
5.返回除newType和keywordsZh字段的:
GET mei_toutiao/_search
{
"_source": {
"excludes": [ "newType", "keywordsZh" ]
}
}
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html
SearchRequestBuilder request = getTransportClient().prepareSearch(esProperties.getES_Index()).setTypes(type)
.setRouting(routing).setFetchSource(new String[] {"titleZh", "uuid"} , null);
奇葩需求:
1.
按照论坛名称,对论坛评论总量,1-5月按月聚合 相对应字段为cmtCnt
按照论坛名称,对论坛发帖点赞量,1-5月按月聚合 相对应字段为adtCnt
GET xiao-2018-4-1,xiao-2018-6-12,xiao-2018-3-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{
"term" : {
"sourceType" : "FORUM"
}
},
{
"range": {
"timeDay": {
"gte": "2018-01-01",
"lte": "2018-05-31"
}
}
}
]
}
}
}
},
"aggs" : {
"all_interests" : {
"terms" : {
"size" : 100000,
"field" : "website.keyword"
},
"aggs": {
"month_num": {
"date_histogram": {
"field": "timeDay",
"interval": "month",
"format": "yyyy-MM"
},
"aggs": {
"single_sum": {
"sum" : { "field" : "cmtCnt" }
}
}
}
}
}
}
}
2.
按照论坛名称,对论坛正面情感总量(非负数),1-5月按月聚合 相对应字段为sentimentOrient
GET xiao-2018-4-1,xiao-2018-6-12,xiao-2018-3-1/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{
"term" : {
"sourceType" : "FORUM"
}
},
{
"range": {
"timeDay": {
"gte": "2018-01-01",
"lte": "2018-05-31"
}
}
}
],
"must_not" : [
{ "term" : { "sentimentOrient" : -1} },
{ "term" : { "sentimentOrient" : 0 } }
]
}
}
}
},
"aggs" : {
"all_interests" : {
"terms" : {
"size" : 100000,
"field" : "website.keyword"
},
"aggs": {
"month_num": {
"date_histogram": {
"field": "timeDay",
"interval": "month",
"format": "yyyy-MM"
}
}
}
}
}
}
疑问:本来一开始想按官网https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/_extended_example.html上用extended_bounds来限制时间范围死活不好使我也是奇了怪了,最后只能转变思路在查询的时候做手脚了。
3.
需要监测的关键词:零跑,零跑汽车,零跑S01
需要过滤关键词:零跑腿,专家门诊
社交和新闻根据url字段去重后6月2号到7月2号的总数值
GET xiao-2018-6-12,xiao-2018-6-19,xiao-2018-6-26,xiao-2018-6-5/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{
"range": {
"timeDay": {
"gte": "2018-06-02",
"lte": "2018-07-02"
}
}
},
{
"query_string":{
"default_field":"textZh",
"query":"零跑 OR 零跑汽车 OR 零跑S01 NOT 零跑腿 NOT 专家门诊"
或者
"query" : "( ( \"\"零跑\"\" ) OR ( \"\"零跑汽车\"\" ) OR ( \"\"零跑S01\"\" ) NOT ( \"\"零跑腿\"\" ) NOT ( \"\"专家门诊\"\" ) )"
}
}
]
}
}
}
},
"aggs" : {
"distinct_colors" : {
"cardinality" : {
"field" : "url"
}
}
}
}
注:上面这个查询结果不对,还是有问题
原因:这个索引mapping里textZh字段的设置如下
"textZh": {
"type": "text",
"store": true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "ik_smart"
}
导致你输入“零跑”会被分词为“零”和“跑”,所以搜索的结果并不是你想要的
解决:
社交(_type是“Socials”的为社交,_type是“News”的为新闻):
GET xiao-2018-6-12,xiao-2018-6-19,xiao-2018-6-26,xiao-2018-6-5/Socials/_search
{
"size" : 0,
"query" : {
"constant_score" : {
"filter" : {
"bool": {
"must": {
"range": {
"timeStr": {
"gte": "2018-06-02 00:00:00",
"lte": "2018-07-03 00:00:00"
}
}
},
"should": [
{
"match_phrase": {
"textZh" : {
"query" : "零跑"
}
}
},
{
"match_phrase": {
"textZh" : {
"query" : "零跑汽车"
}
}
},
{
"match_phrase": {
"textZh" : {
"query" : "零跑S01"
}
}
}
],
"must_not": {
"bool": {
"should": [
{
"match_phrase": {
"textZh" : "零跑腿"
}
},
{
"match_phrase": {
"textZh" : {
"query" : "专家门诊"
}
}
}
]
}
}
}
}
}
},
"aggs" : {
"distinct_colors" : {
"cardinality" : {
"field" : "url"
}
}
}
}
es内置的分词器:
standard analyzer
simple analyzer
whitespace analyzer
language analyzer(特定的语言的分词器)
例句:Set the shape to semi-transparent by calling set_trans(5)
不同分词器的分词结果:
standard analyzer:set, the, shape, to, semi, transparent, by, calling, set_trans, 5(默认的是standard)
simple analyzer:set, the, shape, to, semi, transparent, by, calling, set, trans
whitespace analyzer:Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
language analyzer(特定的语言的分词器,比如说,english,英语分词器):set, shape, semi, transpar, call, set_tran, 5
分词器测试:
GET /_analyze
{
"analyzer": "standard",
"text":"I love you"
}
结果:
{
"tokens": [
{
"token": "i",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "love",
"start_offset": 2,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "you",
"start_offset": 7,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
}
]
}
修改mapping:
1.创建索引:
PUT hui
2.删除索引:
DELETE hui
3.mapping增加字段
(Elasticsearch的mapping一旦创建,只能增加字段,而不能修改已经mapping的字段)
POST hui/News/_mapping
{
"News": {
"properties": {
"hui":{
"type": "text",
"store": true
}
}
}
}
4.修改:
POST hui/News/_mapping
{
"News": {
"properties": {
"hui":{
"type": "integer"
}
}
}
}
报错:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [hui] of different type, current_type [text], merged_type [integer]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [hui] of different type, current_type [text], merged_type [integer]"
},
"status": 400
}
原因:
如果一个字段的类型修改以后,那么该字段的所有数据都需要重新索引。Elasticsearch底层使用的是lucene库,字段类型修改以后索引和搜索要涉及分词方式等操作,不允许修改类型在是符合lucene机制的
重构索引:
1.重建索引hui插入数据并设置别名:
PUT hui
POST hui/News/_mapping
{
"News": {
"properties": {
"hui":{
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
POST hui/News/1
{
"hui" : "hehe"
}
POST hui/_alias/xiao
2.创建索引qiang并插入数据:
PUT qiang
POST qiang/News/_mapping
{
"News": {
"properties": {
"hui":{
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"store": true
}
}
}
}
3.开始执行重构索引命令:
POST _reindex
{
"source": {
"index": "hui"
},
"dest": {
"index": "qiang",
"version_type": "internal"
}
}
注:数据量大的时候会如下显示连接超时,但是却不影响功能实现。我重新导入了十九万的数据大约用了十多分钟
{
"statusCode": 504,
"error": "Gateway Timeout",
"message": "Client request timeout"
}
4.使用Task API查询进度:
GET _tasks?detailed=true&actions=*reindex
{
"nodes": {
"yFpET0TETpuWGCxxyodXmg": {
"name": "yFpET0T",
"transport_address": "192.168.0.100:9300",
"host": "192.168.0.100",
"ip": "192.168.0.100:9300",
"roles": [
"master",
"data",
"ingest"
],
"attributes": {
"ml.max_open_jobs": "10",
"ml.enabled": "true"
},
"tasks": {
"yFpET0TETpuWGCxxyodXmg:6319552": {
"node": "yFpET0TETpuWGCxxyodXmg",
"id": 6319552,
"type": "transport",
"action": "indices:data/write/reindex",
"status": {
"total": 194111,
"updated": 0,
"created": 50000,
"deleted": 0,
"batches": 51,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0
},
"description": "reindex from [mei_toutiao] to [mei_toutiao_v2]",
"start_time_in_millis": 1532338516013,
"running_time_in_nanos": 176981696219,
"cancellable": true
}
}
}
}
}
5.如果复制完成则显示:
{
"nodes": {}
}
6.别名转换:
POST /_aliases
{
"actions": [
{ "remove": {
"alias": "xiao",
"index": "hui"
}},
{ "add": {
"alias": "xiao",
"index": "qiang"
}}
]
}