Elasticsearch的DSL操作命令大全

以下执行命令都是基于阿里es提供的kibana。

前言：

以前在服务器上直接使用curl命令就可以进行es的查询，后来公司用了阿里的es后，在阿里给的服务器上执行命令居然会报错

[root@Alihui ~]# curl -XGET es-cn-huiiiiiiiiiiiii.elasticsearch.aliyuncs.com:9200
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}[root@Ali98 ~]# timed out waiting for input: auto-logout

解决：原来是人家阿里做了相应的控制，需要输入用户和密码按人家的套路才能访问，详情请看https://help.aliyun.com/document_detail/57877.html?spm=a2c4g.11186623.6.548.AAW08d
正确的连接姿势：

[root@Ali98 ~]# curl -u hui:hui -XGET es-cn-huiiiiiiiiiiiii.elasticsearch.aliyuncs.com:9200
{
  "name" : "huihui",
  "cluster_name" : "es-cn-huiiiiiiiiiiiii",
  "cluster_uuid" : "huiiiiiiiiiiiii_iiiii",
  "version" : {
    "number" : "5.5.3",
    "build_hash" : "930huihui",
    "build_date" : "2017-09-07T15:56:59.599Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}

查看该服务器所有的索引信息：

GET _cat/indices?v

获取索引的mapping：

GET /xiao-2018-6-12/Socials/_mapping

增加：

1.增加指定字段name的值为xiaoqiang：

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
    "script" : "ctx._source.name = \"xiaoqiang\""
}

删除：

1.删除指定字段：

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
    "script" : "ctx._source.remove(\"name_of_new_field\")"
}

2.删除一条数据：

DELETE mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp?routing=news

3.根据多个条件批量删除：

POST mei_toutiao/News/_delete_by_query?routing=news
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        { "term" : { "mediaNameZh" : "5time悦读" } }, 
                        { "term" : { "codeName" : "美发" } }
                    ]
                }
            }
        }
    }
}

更新：

1.局部更新：

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
   "doc" : {
      "userName": "hao"  //有这个字段则修改，没有则增加该字段
   }
}

2.更新字符串数组：

POST mei_toutiao/News/AWPN8pLjs4TGXdjfL8_b/_update?routing=news
{
   "doc" : {
      "littleUrls": [
          "http://shishanghui.oss-cn-beijing.aliyuncs.com/700d2d2936f40fabe5a70b1449f07f9df080.jpg?x-oss-process=image/format,jpg/interlace,1",
          "http://shishanghui.oss-cn-beijing.aliyuncs.com/ed7ad5d1e23441880c59abf0cfd7a89df080.jpg?x-oss-process=image/format,jpg/interlace,1"
      ]
   }
}

3.全部更新：
（不管有没有下面这些字段，都变为只有下面这些内容即全部替换掉下面的，所以慎用！！！）

PUT mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp?routing=news
{
    "counter" : 1,
    "tags" : ["red"]
}

4.批量重置评论量大于0的文章的评论量为0：

POST mei_toutiao/News/_update_by_query?routing=news
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "atdCnt": {
              "gt": 0
            }
          }
        }
      ]
    }
  },
  "script": {
    "inline":"ctx._source.atdCnt = 0"
  }
}

可参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

5.批量增加相应字段并赋值：

POST hui/News/_update_by_query
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "hui": "hehe"
                        }
                    }
                }
            }
        }
    },
    "script": {
        "inline":"ctx._source.name = \"xiaoqiang\""
    }
}

6.使用脚本更新：
当文档存在时，把文档的counter字段设置为3；当文档不存在时，插入一个新的文档，文档的counter字段的值是2

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{  
   "script":{  
      "inline":"ctx._source.counter = 3"
   },
   "upsert":{"counter":2}
}

counter字段加4：
参考（版本号是6.4，官方文档用的是”source”，我的阿里es是5.5.3，用”inline”才好使）：https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
    "script" : {
        "inline": "ctx._source.counter += 4"
    }
}

或者：

{
    "script" : {
        "inline": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
            "count" : 4
        }
    }
}

搜索：

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {             
                "term" : {
                    "_id": "AWNcz4IrB-kQcwLDJ93q"
                }
            }
        }
    }
}

注：
1.”constant_score”的用处参考https://blog.csdn.net/dm_vincent/article/details/42157577
2.match和term的区别可参考https://www.cnblogs.com/yjf512/p/4897294.html
3.term里面也可以是数据相对应的字段（如”newType” : 1），根据字段查可能会返回很多条数据，但是根据_id查只会返回一条数据。

1.搜索一条数据：

GET mei_toutiao/hui/AWNcz4IrB-kQcwLDJ93q?routing=hui

2.搜索全部数据：

GET mei_toutiao/_search

注：可以全部搜索到，但是默认返回10条数据

3.搜索所有newType字段为1的数据：

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "newType": "1"
                        }
                    }
                }
            }
        }
    }
}

搜索所有newType字段不为1的数据：

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must_not" : {
                        "term" : {
                            "newType": "1"
                        }
                    }
                }
            }
        }
    }
}

注意：

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "match_phrase" : {
                            "userId": "1C210E82-21B7-4220-B267-ED3DA6635F6F"
                        }
                    }
                }
            }
        }
    }
}

上面可以查到相应的数据，而下面却不行

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "userId": "1C210E82-21B7-4220-B267-ED3DA6635F6F"
                        }
                    }
                }
            }
        }
    }
}

4.存在该字段的文档：

GET mei_toutiao/_search
{
    "query":{
          "exists": {
                "field": "newType"
           }
    }
}

不存在该字段的文档：

GET mei_toutiao/_search
{
    "query":{
        "bool": {
            "must_not": {
                "exists": {
                    "field": "newType"
                }
            }
        }
    }
}

5.多字段查询：

GET mei_toutiao/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        { "term" : { "sourceType" : "FORUM" } }, 
                        { "term" : { "flwCnt" : 0 } } 
                    ]
                }
            }
        }
    }
}

6.按pubTime字段降序：升序是asc

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "newType": "1"
                        }
                    }
                }
            }
        }
    }
    , "sort": [
      {
          "pubTime": "desc"
      }
    ]
}

7.视频分类中过滤掉抖音：

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "codeName": "视频"
                        }
                    },
                    "must_not" : {
                        "term" : {
                            "mediaNameZh": "抖音"
                        }
                    }
                }
            }
        }
    }
    , "sort": [
      {
          "pubTime": "desc"
      }
    ]
}

对应的java api：

query.must(QueryBuilders.termsQuery("codeName", "视频"))
.mustNot(QueryBuilders.matchQuery("mediaNameZh", "抖音"));
client.setQuery(fqb).addSort("pubTime", SortOrder.DESC);

分页加排序：

client.setQuery(fqb).setFrom((message.getInt("pageNo")-1)*10).setSize(10).addSort("pubTime", SortOrder.DESC);

8.根据时间范围搜索：
参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

GET mei_toutiao/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "pubDay": {
              "gte": "2018-05-11",
              "lte": "2018-05-12"
            }
          }
        }
      ]
    }
  }
}

昨天到现在：

GET mei_toutiao/_search
{
    "query": {
        "range" : {
            "pubDay" : {
                "gte" : "now-1d/d",
                "lt" :  "now/d"
            }
        }
    }
}

按相应的时间格式查询：

GET mei_toutiao/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "pubDay": {
              "gte": "2018-05-29 00:00:00",
              "lte": "2018-05-30 00:00:00",
              "format": "yyyy-MM-dd HH:mm:ss"
            }
          }
        }
      ]
    }
  }
}

或者：

GET mei_toutiao/_search
{
    "query": {
        "range" : {
            "pubDay" : {
                "gte": "30/05/2018",
                "lte": "2019",
                "format": "dd/MM/yyyy||yyyy"
            }
        }
    }
}

对应的java api：

QueryBuilder fqb = QueryBuilders.boolQuery().filter(new RangeQueryBuilder("pubDay").gte("2018-05-29 12:00:00").lte("2018-05-30 00:00:00").format("yyyy-MM-dd HH:mm:ss")).filter(filterQuery(message));

聚合统计：

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,    //取出前几条数据
    "query" : {    //可以先使用query查询得到需要的数据集
        "term" : {
            "website" : "微信"
        }
    },
    "aggs" : {
        "single_sum": {    //这个名字随便起
            "sum" : { "field" : "flwCnt" }    //这个必须是number类型字段，flwCnt字段为关注量
        }
    }
}

注意：在执行上面命令的时候遇到了illegal_argument_exception报错，报错信息如下

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [website] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "xiao-2018-4-1",
        "node": "Vux5eT5mTg2iiiiiiiiiii",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [website] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ]
  },
  "status": 400
}

解决：在这个website字段后面加.keyword就可以了。
原因：原来这个website字段是text类型，可参考https://www.cnblogs.com/duanxuan/p/6566744.html和https://segmentfault.com/a/1190000008897731

1.分类聚合：

GET mei_toutiao/_search
{
    "size" : 0,
    "aggs" : {
        "per_count" : {
           "terms" : {
              "size" : 22,    //不加这个默认只会返回10条数据
              "field" : "codeName"
           }
        }
    }
}

结果：

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "failed": 0
  },
  "hits": {
    "total": 52766,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "per_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "视频",
          "doc_count": 17258
        },
        {
          "key": "旅游",
          "doc_count": 10132
        },
        {
          "key": "娱乐",
          "doc_count": 8867
        },
        {
          "key": "健康",
          "doc_count": 4247
        },
        {
          "key": "情感",
          "doc_count": 2932
        },
        {
          "key": "星座",
          "doc_count": 2281
        },
        {
          "key": "整形",
          "doc_count": 2150
        },
        {
          "key": "美容",
          "doc_count": 2012
        },
        {
          "key": "亲子",
          "doc_count": 861
        },
        {
          "key": "国学",
          "doc_count": 444
        },
        {
          "key": "艺术",
          "doc_count": 442
        },
        {
          "key": "搭配",
          "doc_count": 393
        }
      ]
    }
  }
}

注：可参考官网https://www.elastic.co/guide/cn/elasticsearch/guide/current/cardinality.html

2.sourceType字段为论坛的媒体名称聚合：

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "sourceType" : "FORUM"
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "per_count" : {
           "terms" : {
              "size" : 10000,
              "field" : "website.keyword"
           }
        }
    }
}

3.根据name字段聚合，并且得出每个分类下的最大阅读量：

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "aggs" : {
        "per_count" : {
           "terms" : {
              "size" : 10000,
              "field" : "name"
           },
           "aggs" : {
                "max_count" : {
                    "max" : {
                       "field" : "view"
                    }
                }
            }
        }
    }
}

4.查询平媒最近每天的日更量+有多少数据源（聚合结果去重排序）：

GET xiao-2018-4-1/News/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        {
                            "term" : {
                               "mediaTname": "平媒"
                            }
                        },
                        {
                            "range": {
                                "pubDay": {
                                    "gt": "2018-08-31",
                                    "lt": "2018-09-09"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs" : {
        "all_interests" : {
            "terms" : {
                "field" : "pubDay",
                "order" : { "distinct_mediaNameZh" : "desc" }
            },
            "aggs" : {
                "distinct_mediaNameZh" : {
                    "cardinality" : {
                       "field" : "mediaNameZh"
                    }
                }
            }
        }
    }
}

注：
1.根据查询到文档数量排序

"order" : {  "_count" : "desc" }

2.根据聚合字段排序（让结果按pubDay字段排序，该字段类似”2018-08-24”）

"order" : {  "_term" : "desc" }

3.根据子聚合结果排序

"order" : { "distinct_mediaNameZh" : "desc" }

5.sourceType字段为论坛的媒体名称聚合：
（并且每个媒体名称取出一个文章的url链接）

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "sourceType" : "FORUM"
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "all_interests" : { 
            "terms" : {
                "size" : 10000,   //这个语句是没有问题，但是这么大的量扛不住（嵌套聚合导致处理的数据量指数型爆炸增加），总是报连接超时
                "field" : "website.keyword" 
            }, 
            "aggs" : {
                "per_count" : {    //这个字段名字随意取
                    "terms" : {
                       "size" : 1,
                       "field" : "url"
                    }
                }
            }
        }
    }
}

解决上面的性能问题（转换思路）：

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "sourceType" : "FORUM"
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "all_interests" : { 
            "terms" : {
                "size" : 10000,
                "field" : "website.keyword" 
            }, 
            "aggs": {
                "top_age": {
                    "top_hits": {
                        "_source": {
                            "includes": [
                                "url"
                             ]
                        },
                        "size": 1
                    }
                }
            }
        }
    }
}

全局桶：

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "sourceType" : "FORUM"
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "per_count": {
            "terms" : { "field" : "website.keyword" } 
        },
        "all": {
            "global" : {}, 
            "aggs" : {
                "per_count": {
                    "terms" : { "field" : "website.keyword" } 
                }
            }
        }
    }
}

可参考：https://www.elastic.co/guide/cn/elasticsearch/guide/current/_scoping_aggregations.html

合并查询语句：

{
    "bool": {
        "must": { "match":   { "email": "business opportunity" }},
        "should": [
            { "match":       { "starred": true }},
            { "bool": {
                "must":      { "match": { "folder": "inbox" }},
                "must_not":  { "match": { "spam": true }}
            }}
        ],
        "minimum_should_match": 1
    }
}

注：上面这个语句逻辑比较复杂需要好好思考一下（找出信件正文包含business opportunity的星标邮件，或者在收件箱正文包含business opportunity的非垃圾邮件），该列子来自官网https://www.elastic.co/guide/cn/elasticsearch/guide/current/query-dsl-intro.html

返回指定的字段：

1.store：返回有newType字段数据的codeName和view的内容

GET mei_toutiao/_search
{
    "stored_fields" : ["codeName", "view"],
    "query":{
          "exists": {
                "field": "newType"
           }
    }
}

SearchRequestBuilder request = getTransportClient().prepareSearch(esProperties.getES_Index()).setTypes(type)
                .setRouting(routing).storedFields(new String[] {"titleZh", "uuid"});

参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-stored-fields.html
前提：mapping中相应的字段store参数为true
（参考https://blog.csdn.net/napoay/article/details/73100110?locationNum=9&fps=1#323-store）默认情况下，自动是被索引的也可以搜索，但是不存储，这也没关系，因为_source字段里面保存了一份原始文档。在某些情况下，store参数有意义，比如一个文档里面有title、date和超大的content字段，如果只想获取title和date，可以这样：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type": "text",
          "store": true 
        },
        "date": {
          "type": "date",
          "store": true 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

GET my_index/_search
{
  "stored_fields": [ "title", "date" ] 
}

查询结果：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "fields": {
          "date": [
            "2015-01-01T00:00:00.000Z"
          ],
          "title": [
            "Some short title"
          ]
        }
      }
    ]
  }
}

Stored fields返回的总是数组，如果想返回原始字段，还是要从_source中取。
注意：在java代码中需要将field放在数组中，否则只会返回数组中的第一个

JSONObject hitJson = JSONObject.fromObject(hit.getFields());
String[] fields = [ "keywordsZh", "littleUrls"];
for (Object field : fields) {
    if (hit.getFields().containsKey(field)) {
        if (field.equals("keywordsZh")) {
                        @SuppressWarnings("unchecked")
                        List<String> keywordsZh = (List<String>) hitJson.getJSONObject(field.toString()).get("values");
                        json.put(field, keywordsZh);
//                      json.put(field, hitJson.getJSONObject(field.toString()).get("value")); // 只返回该数组的第一个值
              }
        }
}

2.返回一个指定的字段：

GET mei_toutiao/_search
{
    "_source": "newType",
    "query":{
          "term": {
                "uuid": "b6a0d42731c94db1a75383c192b5544a"
           }
    }
}

或者：

GET mei_toutiao/_search
{
    "_source": {
        "includes": "newType"
    },
    "query":{
        "term": {
            "uuid": "b6a0d42731c94db1a75383c192b5544a"
        }
    }
}

3.只返回newType和keywordsZh字段：

GET mei_toutiao/_search
{
    "_source": [ "newType", "keywordsZh" ]
}

或者：

GET mei_toutiao/_search
{
    "_source": {
        "includes": [ "newType", "keywordsZh" ]
    }
}

4.返回字段前缀名为t的：

GET mei_toutiao/_search
{
    "_source": "t*"
}

5.返回除newType和keywordsZh字段的：

GET mei_toutiao/_search
{
    "_source": {
        "excludes": [ "newType", "keywordsZh" ]
    }
}

参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html

SearchRequestBuilder request = getTransportClient().prepareSearch(esProperties.getES_Index()).setTypes(type)
                .setRouting(routing).setFetchSource(new String[] {"titleZh", "uuid"} , null);

奇葩需求：

1.
按照论坛名称，对论坛评论总量，1-5月按月聚合相对应字段为cmtCnt
按照论坛名称，对论坛发帖点赞量，1-5月按月聚合相对应字段为adtCnt

GET xiao-2018-4-1,xiao-2018-6-12,xiao-2018-3-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        {
                            "term" : {
                               "sourceType" : "FORUM"
                            }
                        },
                        {
                            "range": {
                                "timeDay": {
                                    "gte": "2018-01-01",
                                    "lte": "2018-05-31"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs" : {
        "all_interests" : {
            "terms" : {
                "size" : 100000,
                "field" : "website.keyword"
            },
            "aggs": {
                "month_num": {
                    "date_histogram": {
                        "field": "timeDay",
                        "interval": "month",
                        "format": "yyyy-MM"
                    },
                    "aggs": {
                        "single_sum": {
                            "sum" : { "field" : "cmtCnt" }
                        }
                    }
                }
            }
        }
    }
}

2.
按照论坛名称，对论坛正面情感总量（非负数），1-5月按月聚合相对应字段为sentimentOrient

GET xiao-2018-4-1,xiao-2018-6-12,xiao-2018-3-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        {
                            "term" : {
                               "sourceType" : "FORUM"
                            }
                        },
                        {
                            "range": {
                                "timeDay": {
                                    "gte": "2018-01-01",
                                    "lte": "2018-05-31"
                                }
                            }
                        }
                    ],
                    "must_not" : [
                        { "term" : { "sentimentOrient" : -1} },
                        { "term" : { "sentimentOrient" : 0 } }
                    ]
                }
            }
        }
    },
    "aggs" : {
        "all_interests" : {
            "terms" : {
                "size" : 100000,
                "field" : "website.keyword"
            },
            "aggs": {
                "month_num": {
                    "date_histogram": {
                        "field": "timeDay",
                        "interval": "month",
                        "format": "yyyy-MM"
                    }
                }
            }
        }
    }
}

疑问：本来一开始想按官网https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/_extended_example.html上用extended_bounds来限制时间范围死活不好使我也是奇了怪了，最后只能转变思路在查询的时候做手脚了。

3.
需要监测的关键词：零跑,零跑汽车,零跑S01
需要过滤关键词：零跑腿,专家门诊
社交和新闻根据url字段去重后6月2号到7月2号的总数值

GET xiao-2018-6-12,xiao-2018-6-19,xiao-2018-6-26,xiao-2018-6-5/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        {
                            "range": {
                                "timeDay": {
                                    "gte": "2018-06-02",
                                    "lte": "2018-07-02"
                                }
                            }
                        },
                        {
                            "query_string":{
                                "default_field":"textZh",
                                "query":"零跑 OR 零跑汽车 OR 零跑S01 NOT 零跑腿 NOT 专家门诊"
                                或者
                                "query" : "( ( \"\"零跑\"\" ) OR ( \"\"零跑汽车\"\" ) OR ( \"\"零跑S01\"\" ) NOT ( \"\"零跑腿\"\" ) NOT ( \"\"专家门诊\"\" ) )"
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs" : {
        "distinct_colors" : {
            "cardinality" : {
                "field" : "url"
            }
        }
    }
}

注：上面这个查询结果不对，还是有问题
原因：这个索引mapping里textZh字段的设置如下

          "textZh": {
            "type": "text",
            "store": true,
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            },
            "analyzer": "ik_smart"
          }

导致你输入“零跑”会被分词为“零”和“跑”，所以搜索的结果并不是你想要的
解决：
社交（_type是“Socials”的为社交，_type是“News”的为新闻）：

GET xiao-2018-6-12,xiao-2018-6-19,xiao-2018-6-26,xiao-2018-6-5/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool": {
                    "must": {
                        "range": {
                            "timeStr": {
                                "gte": "2018-06-02 00:00:00",
                                "lte": "2018-07-03 00:00:00"
                            }
                        }
                    },
                    "should": [
                        {
                            "match_phrase": {
                                "textZh" : {
                                    "query" : "零跑"
                                }
                            }
                        },
                        {
                            "match_phrase": {
                                "textZh" : {
                                    "query" : "零跑汽车"
                                }
                            }
                        },
                        {
                            "match_phrase": {
                                "textZh" : {
                                    "query" : "零跑S01"
                                }
                            }
                        }
                    ],
                    "must_not": {
                        "bool": {
                            "should": [
                                {
                                    "match_phrase": {
                                        "textZh" : "零跑腿"
                                    }
                                },
                                {
                                    "match_phrase": {
                                        "textZh" : {
                                            "query" : "专家门诊"
                                        }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "distinct_colors" : {
            "cardinality" : {
                "field" : "url"
            }
        }
    }
}

es内置的分词器：

standard analyzer
simple analyzer
whitespace analyzer
language analyzer(特定的语言的分词器)

例句：Set the shape to semi-transparent by calling set_trans(5)
不同分词器的分词结果：
standard analyzer：set, the, shape, to, semi, transparent, by, calling, set_trans, 5（默认的是standard）
simple analyzer：set, the, shape, to, semi, transparent, by, calling, set, trans
whitespace analyzer：Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
language analyzer（特定的语言的分词器，比如说，english，英语分词器）：set, shape, semi, transpar, call, set_tran, 5

分词器测试：

GET /_analyze
{
  "analyzer": "standard",
  "text":"I love you"
}

结果：

{
  "tokens": [
    {
      "token": "i",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "love",
      "start_offset": 2,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "you",
      "start_offset": 7,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

修改mapping：

1.创建索引：

PUT hui

2.删除索引：

DELETE hui

3.mapping增加字段
（Elasticsearch的mapping一旦创建，只能增加字段，而不能修改已经mapping的字段）

POST hui/News/_mapping
{
    "News": {
        "properties": {
            "hui":{
                "type": "text",
                "store": true
            }
        }
    }
}

4.修改：

POST hui/News/_mapping
{
    "News": {
        "properties": {
            "hui":{
                "type": "integer"
            }
        }
    }
}

报错：

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "mapper [hui] of different type, current_type [text], merged_type [integer]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "mapper [hui] of different type, current_type [text], merged_type [integer]"
  },
  "status": 400
}

原因：
如果一个字段的类型修改以后，那么该字段的所有数据都需要重新索引。Elasticsearch底层使用的是lucene库，字段类型修改以后索引和搜索要涉及分词方式等操作，不允许修改类型在是符合lucene机制的

重构索引：

1.重建索引hui插入数据并设置别名：

PUT hui
POST hui/News/_mapping
{
    "News": {
        "properties": {
            "hui":{
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
        }
    }
}
POST hui/News/1
{
   "hui" : "hehe"
}
POST hui/_alias/xiao

2.创建索引qiang并插入数据：

PUT qiang
POST qiang/News/_mapping
{
    "News": {
        "properties": {
            "hui":{
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                },
                "store": true
            }
        }
    }
}

3.开始执行重构索引命令：

POST _reindex
{
  "source": {
    "index": "hui"
  },
  "dest": {
    "index": "qiang",
    "version_type": "internal"
  }
}

注：数据量大的时候会如下显示连接超时，但是却不影响功能实现。我重新导入了十九万的数据大约用了十多分钟

{
  "statusCode": 504,
  "error": "Gateway Timeout",
  "message": "Client request timeout"
}

4.使用Task API查询进度：

GET _tasks?detailed=true&actions=*reindex
{
  "nodes": {
    "yFpET0TETpuWGCxxyodXmg": {
      "name": "yFpET0T",
      "transport_address": "192.168.0.100:9300",
      "host": "192.168.0.100",
      "ip": "192.168.0.100:9300",
      "roles": [
        "master",
        "data",
        "ingest"
      ],
      "attributes": {
        "ml.max_open_jobs": "10",
        "ml.enabled": "true"
      },
      "tasks": {
        "yFpET0TETpuWGCxxyodXmg:6319552": {
          "node": "yFpET0TETpuWGCxxyodXmg",
          "id": 6319552,
          "type": "transport",
          "action": "indices:data/write/reindex",
          "status": {
            "total": 194111,
            "updated": 0,
            "created": 50000,
            "deleted": 0,
            "batches": 51,
            "version_conflicts": 0,
            "noops": 0,
            "retries": {
              "bulk": 0,
              "search": 0
            },
            "throttled_millis": 0,
            "requests_per_second": -1,
            "throttled_until_millis": 0
          },
          "description": "reindex from [mei_toutiao] to [mei_toutiao_v2]",
          "start_time_in_millis": 1532338516013,
          "running_time_in_nanos": 176981696219,
          "cancellable": true
        }
      }
    }
  }
}

5.如果复制完成则显示：

{
  "nodes": {}
}

6.别名转换：

POST /_aliases
{
    "actions": [
        { "remove": {
            "alias": "xiao",
            "index": "hui"
        }},
        { "add": {
            "alias": "xiao",
            "index": "qiang"
        }}
    ]
}