(转)通过HTTP RESTful API 操作elasticsearch搜索数据

样例数据集

这是编造的JSON格式银行客户账号信息文档,文档schema如下: 

“account_number”: 0, 
“balance”: 16623, 
“firstname”: “Bradshaw”, 
“lastname”: “Mckenzie”, 
“age”: 29, 
“gender”: “F”, 
“address”: “244 Columbus Place”, 
“employer”: “Euron”, 
“email”: “[email protected]”, 
“city”: “Hobucken”, 
“state”: “CO” 

这些数据可以通过www.json-generator.com网站生成


加载样例数据集

下载样例数据集链接 
解压数据到指定目录,然后加载到elasticsearch集群

绝对路径:
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@/home/cluster/apps/elasticsearch/elasticsearch-1.7.2/test/accounts.json" 相对路径: curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@test/accounts.json"
  • 1
  • 2
  • 3
  • 4
  • 5

curl 'localhost:9200/_cat/indices?v'
结果:
health status index              pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   bank                 5   1 1000 0 417.1kb 417.1kb 
  • 1
  • 2
  • 3
  • 4

上面结果,说明我们成功bulk 1000个文档到bank索引中了


搜索数据API

有两种方式:一种方式是通过 REST 请求 URI ,发送搜索参数;另一种是通过REST 请求体,发送搜索参数。而请求体允许你包含更容易表达和可阅读的JSON格式。

  • 通过 REST 请求 URI
curl 'localhost:9200/bank/_search?q=*&pretty'

结果:
{
  "took" : 63,
  "timed_out" : false,
  "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "1", "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"} }, { "_index" : "bank", "_type" : "account", "_id" : "6", "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"} }, { "_index" : "bank", "_type" : "account",
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27

q=*,参数告诉elasticsearch,在bank索引中匹配所有的文档 
pretty,参数告诉elasticsearch,返回形式打印JSON结果

  • 通过REST 请求体:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} }
}'

结果:
{
  "took" : 26, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "1", "_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"} }, { "_index" : "bank", "_type" : "account", "_id" : "6", "_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"[email protected]","city":"Dante","state":"TN"} }, { "_index" : "bank", "_type" : "account", "_id" : "13",
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

与第一种方式不同是在URI中替代传递q=*,使用POST方式提交,请求体包含JSON格式搜索


介绍查询语言

elasticsearch提供JSON格式领域特定语言执行查询。可参考Query DSL

{
  "query": { "match_all": {} }
}
  • 1
  • 2
  • 3

query:告诉我们定义查询 
match_all:运行简单类型查询指定索引中的所有文档

除了指定查询参数,还可以指定其他参数来影响最终的结果。

  • match_all & 只返回第一个文档:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "size": 1 }' 结果: { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "4", "_score" : 1.0, "_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"[email protected]","city":"Eastvale","state":"HI"} } ] } }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

如果不指定size,默认是返回10条文档信息


  • match_all & 返回11到20个文档信息
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "from": 10, "size": 10 }'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

from:指定文档索引从哪里开始,默认从0开始 
size:从from开始,返回多个文档 
这feature在实现分页查询很有用


  • match_all and 根据account balance 降序排序 & 返回10个文档(默认10个)
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } } }'
  • 1
  • 2
  • 3
  • 4
  • 5

执行搜索

默认的,我们搜索返回完整的JSON文档。而source(_source字段搜索点击量)。如果我们不想返回完整的JSON文档,我们可以使用source返回指定字段。

  • 返回 account_number and balance:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"] }'
  • 1
  • 2
  • 3
  • 4
  • 5

这样操作有点类似于SQL SELECT FROM field lis


match 查询,可作为基本字段搜索查询 
- 返回 account_number=20:

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match": { "account_number": 20 } } }'
  • 1
  • 2
  • 3
  • 4

  • 返回 address=mill:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match": { "address": "mill" } } }'
  • 1
  • 2
  • 3
  • 4

  • 返回 address=mill or address=lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match": { "address": "mill lane" } } }'
  • 1
  • 2
  • 3
  • 4

  • 返回 短语匹配 address=mill lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_phrase": { "address": "mill lane" } } }'
  • 1
  • 2
  • 3
  • 4

布尔值(bool)查询

  • 返回 匹配address=mill & address=lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

must:要求所有条件都要满足(类似于&&)


  • 返回 匹配address=mill or address=lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "should": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

should:任何一个满足就可以(类似于||)


  • 返回 不匹配address=mill & address=lane:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must_not": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

must_not:所有条件都不能满足(类似于! (&&))


  • 返回 age=40 & state!=ID
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must": [ { "match": { "age": "40" } } ], "must_not": [ { "match": { "state": "ID" } } ] } } }'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

执行过滤器

文档中score(_score字段是搜索结果)。score是一个数字型的,是一种相对方法匹配查询文档结果。分数越高,搜索关键字与该文档相关性越高;越低,搜索关键字与该文档相关性越低。

在elasticsearch中所有的搜索都会触发相关性分数计算。如果我们不使用相关性分数计算,那要使用另一种查询能力,构建过滤器

过滤器是类似于查询的概念,除了得以优化,更快的执行速度的两个主要原因: 
1. 过滤器不计算得分,所以他们比执行查询的速度 
2. 过滤器可缓存在内存中,允许重复搜索

为了便于理解过滤器,先介绍过滤器搜索(like match_all, match, bool, etc.),可以与其他的普通查询搜索组合一个过滤器。 
range filter,允许我们通过一个范围值来过滤文档,一般用于数字或日期过滤

使用过滤器搜索返回 balances[ 20000,30000]。换句话说,balance>=20000 && balance<=30000

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "filtered": {
      "query": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } }' 结果: { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 217, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "account", "_id" : "4", "_score" : 1.0, "_source":{"account_number":4,"balance":27658,"firstname":"Rodriquez","lastname":"Flores","age":31,"gender":"F","address":"986 Wyckoff Avenue","employer":"Tourmania","email":"[email protected]","city":"Eastvale","state":"HI"} }, { "_index" : "bank", "_type" : "account", "_id" : "9", "_score" : 1.0, "_source":{"account_number":9,"balance":24776,"firstname":"Opal","lastname":"Meadows","age":39,"gender":"M","address":"963 Neptune Avenue","employer":"Cedward","email":"[email protected]","city":"Olney","state":"OH"} }, { "_index" : "bank", "_type" : "account", "_id" : "11", "_score" : 1.0, "_source":{"account_number":11,"balance":20203,"firstname":"Jenkins","lastname":"Haney","age":20,"gender":"M","address":"740 Ferry Place","employer":"Qimonk","email":"[email protected]","city":"Steinhatchee","state":"GA"} }, { "_index" : "bank", "_type" : "account", "_id" : "42", "_score" : 1.0, "_source":{"account_number":42,"balance":21137,"firstname":"Harding","lastname":"Hobbs","age":26,"gender":"F","address":"474 Ridgewood Place","employer":"Xth","email":"[email protected]","city":"Heil","state":"ND"} }, { "_index" : "bank", "_type" : "account", "_id" : "54", "_score" : 1.0, "_source":{"account_number":54,"balance":23406,"firstname":"Angel","lastname":"Mann","age":22,"gender":"F","address":"229 Ferris Street","employer":"Amtas","email":"[email protected]","city":"Calverton","state":"WA"} }, { "_index" : "bank", "_type" : "account", "_id" : "66", "_score" : 1.0, "_source":{"account_number":66,"balance":25939,"firstname":"Franks","lastname":"Salinas","age":28,"gender":"M","address":"437 Hamilton Walk","employer":"Cowtown","email":"[email protected]","city":"Chase","state":"VT"} }, { "_index" : "bank", "_type" : "account", "_id" : "92", "_score" : 1.0, "_source":{"account_number":92,"balance":26753,"firstname":"Gay","lastname":"Brewer","age":34,"gender":"M","address":"369 Ditmars Street","employer":"Savvy","email":"[email protected]","city":"Moquino","state":"HI"} }, { "_index" : "bank", "_type" : "account", "_id" : "100", "_score" : 1.0, "_source":{"account_number":100,"balance":29869,"firstname":"Madden","lastname":"Woods","age":32,"gender":"F","address":"696 Ryder Avenue","employer":"Slumberia","email":"[email protected]","city":"Deercroft","state":"ME"} }, { "_index" : "bank", "_type" : "account", "_id" : "105", "_score" : 1.0, 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83

过滤查询包含match_all查询(查询部分)和一系列过滤(过滤部分)。可以代替任何其他查询到查询部分以及其他过滤器过滤部分。在上述情况下,过滤器范围智能,因为文档落入range所有匹配“平等”,即。比另一个更相关,没有文档。

一般情况,最明智的方式决定是否使用filter or query,就看你是否关心相关性分数。如果相关性不重要,那就使用filter,否则就使用query。 
queries and filters很类似于关系型数据库中的 “SELECT WHERE clause”


执行聚合

聚合提供从你的数据中分组和提取统计能力。 
类似于关系型数据中的SQL GROUP BY和SQL 聚合函数。

在Elasticsearch,你有能力执行搜索返回命中结果,同时拆分命中结果,然后统一返回结果。当你使用简单的API运行搜索和多个聚合,然后返回所有结果避免网络带宽过大的情况是高效的。

  • 根据state分组,降序统计top 10 state
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "size": 0,
  "aggs": { "group_by_state": { "terms": { "field": "state" } } } }' 结果: "hits" : { "total" : 1000, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "buckets" : [ { "key" : "al", "doc_count" : 21 }, { "key" : "tx", "doc_count" : 17 }, { "key" : "id", "doc_count" : 15 }, { "key" : "ma", "doc_count" : 15 }, { "key" : "md", "doc_count" : 15 }, { "key" : "pa", "doc_count" : 15 }, { "key" : "dc", "doc_count" : 14 }, { "key" : "me", "doc_count" : 14 }, { "key" : "mo", "doc_count" : 14 }, { "key" : "nd", "doc_count" : 14 } ] } } }
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54

类似于关系型数据库

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
  • 1

size=0 不是展示搜索结果命中数,因为我只是想要看聚合结果


  • 根据state计算账户平均balance,降序统计top 10 state
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "size": 0,
  "aggs": { "group_by_state": { "terms": { "field": "state" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

注意嵌套average_balance聚合group_by_state内聚合。这是一个常见的模式,所有的聚合。您可以嵌套内聚合聚合任意提取旋转汇总时,你需要从你的数据。


  • 降序排序平均 balance:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "size": 0,
  "aggs": { "group_by_state": { "terms": { "field": "state", "order": { "average_balance": "desc" } }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

  • 聚合年龄分区间(ages 20-29, 30-39, and 40-49),聚合性别,最后平均balance 展示最终结果
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "size": 0,
  "aggs": { "group_by_age": { "range": { "field": "age", "ranges": [ { "from": 20, "to": 30 }, { "from": 30, "to": 40 }, { "from": 40, "to": 50 } ] }, "aggs": { "group_by_gender": { "terms": { "field": "gender" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } } }'

猜你喜欢

转载自www.cnblogs.com/youngerger/p/9030183.html