ElasticSearch安装与操作

一、简介
Elasticsearch是一个基于Apache Lucene(TM)的开源搜索引擎。无论在开源还是专有领域,Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。

但是,Lucene只是一个库。想要使用它,你必须使用Java来作为开发语言并将其直接集成到你的应用中,更糟糕的是,Lucene非常复杂,你需要深入了解检索的相关知识来理解它是如何工作的。
Elasticsearch也使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能,但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性,从而让全文搜索变得简单。

不过,Elasticsearch不仅仅是Lucene和全文搜索,我们还能这样去描述它:
分布式的实时文件存储,每个字段都被索引并可被搜索
分布式的实时分析搜索引擎
可以扩展到上百台服务器,处理PB级结构化或非结构化数据
而且,所有的这些功能被集成到一个服务里面,你的应用可以通过简单的RESTful API、各种语言的客户端甚至命令行与之交互。

上手Elasticsearch非常容易。它提供了许多合理的缺省值,并对初学者隐藏了复杂的搜索引擎理论。它开箱即用(安装即可使用),只需很少的学习既可在生产环境中使用。

Elasticsearch在Apache 2 license下许可使用,可以免费下载、使用和修改。
随着你对Elasticsearch的理解加深,你可以根据不同的问题领域定制Elasticsearch的高级特性,这一切都是可配置的,并且配置非常灵活。
以上摘自ElasticSearch权威指南中文版。
二、安装
环境准备:es唯一的要求是安装JAVA 8
1.在官网或国内镜像站下载es

[hadoop@master ~]$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.2.zip

2.解压至/app/elasticsearch/目录下
最好按照这样的目录结构,以便于清晰的管理。

[hadoop@master app]$ ls
elasticsearch  hadoop  hbase  hive  java  kafka  scala  spark  tgz  zookeeper

集群可以以下配置elasticsearch.yml

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: myes
#

# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: worker1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: 192.168.163.146

#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------


discovery.zen.ping.multicast.enabled:false
discovery.zen.ping.unicast.hosts: ["192.168.163.145", "192.168.163.146","192.168.163.147"]
discovery.zen.ping_timeout:120s
client.transport.ping_timeout:60s

分发后修改每一台的host
3.启动es
进入es的bin目录下

[hadoop@master bin]$ ./elasticsearch

或者使用-d,在后台启动

[hadoop@master bin]$ ./elasticsearch -d
[2018-05-10T18:54:46,758][INFO ][o.e.n.Node               ] [] initializing ...
[2018-05-10T18:54:47,203][INFO ][o.e.e.NodeEnvironment    ] [mwj6R-0] using [1] data paths, mounts [[/ (/dev/sda2)]], net usable_space [9.3gb], net total_space [17.4gb], spins? [possibly], types [ext4]
[2018-05-10T18:54:47,204][INFO ][o.e.e.NodeEnvironment    ] [mwj6R-0] heap size [1.9gb], compressed ordinary object pointers [true]
[2018-05-10T18:54:47,208][INFO ][o.e.n.Node               ] node name [mwj6R-0] derived from node ID [mwj6R-0TSmCuUySpFZ_9oQ]; set [node.name] to override
[2018-05-10T18:54:47,209][INFO ][o.e.n.Node               ] version[5.5.2], pid[24903], build[b2f0c09/2017-08-14T12:33:14.154Z], OS[Linux/2.6.32-431.el6.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_141/25.141-b15]
[2018-05-10T18:54:47,215][INFO ][o.e.n.Node               ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/app/elasticsearch/elasticsearch-5.5.2]
[2018-05-10T18:54:51,756][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [aggs-matrix-stats]
[2018-05-10T18:54:51,759][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [ingest-common]
[2018-05-10T18:54:51,762][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [lang-expression]
[2018-05-10T18:54:51,762][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [lang-groovy]
[2018-05-10T18:54:51,763][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [lang-mustache]
[2018-05-10T18:54:51,766][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [lang-painless]
[2018-05-10T18:54:51,766][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [parent-join]
[2018-05-10T18:54:51,766][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [percolator]
[2018-05-10T18:54:51,770][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [reindex]
[2018-05-10T18:54:51,771][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [transport-netty3]
[2018-05-10T18:54:51,771][INFO ][o.e.p.PluginsService     ] [mwj6R-0] loaded module [transport-netty4]
[2018-05-10T18:54:51,771][INFO ][o.e.p.PluginsService     ] [mwj6R-0] no plugins loaded
[2018-05-10T18:54:59,336][INFO ][o.e.d.DiscoveryModule    ] [mwj6R-0] using discovery type [zen]
[2018-05-10T18:55:01,941][INFO ][o.e.n.Node               ] initialized
[2018-05-10T18:55:01,941][INFO ][o.e.n.Node               ] [mwj6R-0] starting ...
[2018-05-10T18:55:02,522][INFO ][o.e.t.TransportService   ] [mwj6R-0] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2018-05-10T18:55:02,562][WARN ][o.e.b.BootstrapChecks    ] [mwj6R-0] max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2018-05-10T18:55:02,562][WARN ][o.e.b.BootstrapChecks    ] [mwj6R-0] max number of threads [1024] for user [hadoop] is too low, increase to at least [2048]
[2018-05-10T18:55:02,562][WARN ][o.e.b.BootstrapChecks    ] [mwj6R-0] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-05-10T18:55:02,562][WARN ][o.e.b.BootstrapChecks    ] [mwj6R-0] system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2018-05-10T18:55:05,719][INFO ][o.e.c.s.ClusterService   ] [mwj6R-0] new_master {mwj6R-0}{mwj6R-0TSmCuUySpFZ_9oQ}{TceSDtFySwmi7QzNy3n23w}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2018-05-10T18:55:05,859][INFO ][o.e.h.n.Netty4HttpServerTransport] [mwj6R-0] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2018-05-10T18:55:05,860][INFO ][o.e.n.Node               ] [mwj6R-0] started
[2018-05-10T18:55:05,893][INFO ][o.e.g.GatewayService     ] [mwj6R-0] recovered [0] indices into cluster_state

启动后,如果只有本地可以访问,尝试修改配置文件 elasticsearch.yml
中network.host(注意配置文件格式不是以#开头的要空一格, :后要空一格) 为network.host: 0.0.0.0
如果想在后台以守护进程模式运行,添加-d参数。
4.如果这时报错”max virtual memory areas vm.maxmapcount [65530] is too low”,要运行下面的命令。

$ sudo sysctl -w vm.max_map_count=262144

5.如果一切顺利,Elastic 就会在默认的9200端口运行。这时,打开另一个命令行窗口,请求该端口,会得到说明信息。
使用localhost或者主机名运行

[hadoop@master ~]$ curl localhost:9200
[hadoop@master ~]$ curl master:9200

都出现了不能解析主机名的错误

[hadoop@master ~]$ curl localhost:9200
curl: (6) Couldn't resolve host 'localhost'
[hadoop@master ~]$ curl master:9200
curl: (7) couldn't connect to host

可以在/etc/hosts中添加映射

127.0.0.1   localhost

即可成功

[hadoop@master ~]$ curl localhost:9200
{
  "name" : "mwj6R-0",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "jMNZSP78ToOklHidcYzaWw",
  "version" : {
    "number" : "5.5.2",
    "build_hash" : "b2f0c09",
    "build_date" : "2017-08-14T12:33:14.154Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}

6.请求9200端口,Elastic 返回一个 JSON 对象,包含当前节点、集群、版本等信息。
默认情况下,Elastic 只允许本机访问,如果需要远程访问,可以修改 Elastic 安装目录的config/elasticsearch.yml文件,去掉network.host的注释,将它的值改成0.0.0.0,然后重新启动 Elastic。

network.host: 0.0.0.0

设成0.0.0.0让任何人都可以访问,生产环境要设成对其开放的 IP。
至此es安装完成。
三、CURD
1、 Node 与 Cluster
Elastic 本质上是一个分布式数据库,允许多台服务器协同工作,每台服务器可以运行多个 Elastic 实例。

单个 Elastic 实例称为一个节点(node)。一组节点构成一个集群(cluster),你最好找一个合适的名字来替代cluster.name的默认值,比如你自己的名字,这样可以防止一个新启动的节点加入到相同网络中的另一个同名的集群中。

2、索引列表: curl ‘localhost:9200/_cat/indices?v’
Elastic 会索引所有字段,经过处理后写入一个反向索引(Inverted Index)。查找数据的时候,直接查找该索引。

所以,Elastic 数据管理的顶层单位就叫做 Index(索引)。它是单个数据库的同义词。每个 Index (即数据库)的名字必须是小写。

下面的命令可以查看当前节点的所有 Index。

[hadoop@master bin]$ curl -X GET 'http://localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

3.健康检查
status:绿色代表一切正常(集群功能齐全),黄色意味着所有的数据都是可用的,但是某些复制没有被分配(集群功能齐全),红色则代表因为某些原因,某些数据不可用

[hadoop@master bin]$ curl 'localhost:9200/_cat/health?v'
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1526008494 20:14:54  elasticsearch green           1         1      0   0    0    0        0             0                  -                100.0%

4.节点查询:curl ‘localhost:9200/_cat/nodes?v’

[hadoop@master bin]$ curl 'localhost:9200/_cat/nodes?v'
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1            3          93   1    0.00    0.00     0.00 mdi       *      mwj6R-0

5.索引创建: curl -XPUT ‘localhost:9200/customer?pretty’

[hadoop@master bin]$ curl -XPUT 'localhost:9200/customer?pretty'
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

6.索引删除: curl -XDELETE ‘localhost:9200/customer?pretty’

[hadoop@master bin]$ curl -XDELETE 'localhost:9200/customer?pretty'
[hadoop@master bin]$ curl -XDELETE 'localhost:9200/customer?pretty'
{
  "acknowledged" : true
}

7.文档创建: curl -XPUT ‘localhost:9200/customer/external/1?pretty’ -d ’ {“name”: “mike”}’
注:为了索引一个文档,我们必须告诉Elasticsearch这个文档要到这个索引的哪个类型(type)下
示例:将一个简单客户文档索引到customer索引、“external”类型中,这个文档的ID是1

[hadoop@master bin]$  curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' {"name": "mike"}'
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

注: 不指定id的时候,使用POST,elasticsearch会自动生成一个ID
curl -XPOST ‘localhost:9200/customer/external?pretty’ -d ’ {“name”: “mike” }’
8.文档查询: curl -XGET ‘localhost:9200/customer/external/1?pretty’

[hadoop@master bin]$ curl -XGET 'localhost:9200/customer/external/1?pretty'
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "mike"
  }
}

9.文档更新:curl -XPUT ‘localhost:9200/customer/external/1?pretty’ -d ’ { “name”: “Jerry” }’

[hadoop@master bin]$ curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' { "name": "Jerry" }'
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : false
}

[hadoop@master bin]$ curl 'localhost:9200/_cat/indices?v'
health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer P9T7BuUERJ-64Jybb4gYag   5   1          1            0      7.1kb          7.1kb

注:version变成了2,docs.count个数没有变化
10.文档删除-单个: curl -XDELETE ‘localhost:9200/customer/external/2?pretty’
注:指定删除的ID

{
  "found" : true,
  "_index" : "customer",
  "_type" : "external",
  "_id" : "2",
  "_version" : 3,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  }
}

文档删除-多个:
curl -XDELETE ‘localhost:9200/customer/external/_query?pretty’ -d ’
{
“query”: { “match”: { “name”: “John Doe” } }
}’
注:首先查询出所有name为John Doe的,然后一起删除

文档批处理-创建:创建一个Id为21和22的文档
curl -XPOST ‘localhost:9200/customer/external/_bulk?pretty’ -d ’
{“index”:{“_id”:”21”}}
{“name”: “peter” }
{“index”:{“_id”:”22”}}
{“name”: “peter” } ‘

[hadoop@master bin]$ curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
> {"index":{"_id":"21"}}
> {"name": "peter" }
> {"index":{"_id":"22"}}
> {"name": "peter" } '
{
  "took" : 15,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "21",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "created" : true,
        "status" : 201
      }
    }
  ]
}

文档批处理:一个更新,一个删除
curl -XPOST ‘localhost:9200/customer/external/_bulk?pretty’ -d ’
{“update”:{“_id”:”21”}}
{“doc”: { “name”: “Bob” } }
{“delete”:{“_id”:”22”}}

[hadoop@master bin]$ curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
> {"update":{"_id":"21"}}
> {"doc": { "name": "Bob" } }
> {"delete":{"_id":"22"}}
> '
{
  "took" : 27,
  "errors" : false,
  "items" : [
    {
      "update" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "21",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "status" : 200
      }
    },
    {
      "delete" : {
        "found" : false,
        "_index" : "customer",
        "_type" : "external",
        "_id" : "22",
        "_version" : 1,
        "result" : "not_found",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "status" : 404
      }
    }
  ]
}

*注: bulk API按顺序执行这些动作。如果其中一个动作因为某些原因失败了,将会继续处理它后面的动作。当bulk API返回时,它将提供每个动作的状态(按照同样的顺序),所以你能够看到某个动作成功与否。

实例

https://github.com/bly2k/files/blob/master/accounts.zip?raw=true 下载数据样本,解压上传并导入ES

curl -XPOST ‘localhost:9200/bank/account/_bulk?pretty’ –data-binary @/usr/share/elasticsearch/accounts.json

[hadoop@master bin]$ curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @/app/elasticsearch/elasticsearch-5.5.2/datasource/accounts.json

查询API:_search
使用 GET 方法,直接请求/Index/Type/_search,就会返回所有记录。
注:有两种基本的方式来运行搜索:一种是在REST请求的URI中发送搜索参数,另一种是将搜索参数发送到REST请求体中。请求体方法的表达能力更好,并且你可以使用更加可读的JSON格式来定义搜索。

① 加参数方式:curl ‘localhost:9200/bank/_search?q=*&pretty’ 返回bank索引中的所有的文档

注:_search:在bank索引中搜索,q=参数指示Elasticsearch去匹配这个索引中所有的文档*

[hadoop@master ~]$ curl 'localhost:9200/bank/_search?q=*&pretty'
{
  "took" : 267,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "25",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 25,
          "balance" : 40540,
          "firstname" : "Virginia",
          "lastname" : "Ayala",
          "age" : 39,
          "gender" : "F",
          "address" : "171 Putnam Avenue",
          "employer" : "Filodyne",
          "email" : "virginiaayala@filodyne.com",
          "city" : "Nicholson",
          "state" : "PA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "44",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 44,
          "balance" : 34487,
          "firstname" : "Aurelia",
          "lastname" : "Harding",
          "age" : 37,
          "gender" : "M",
          "address" : "502 Baycliff Terrace",
          "employer" : "Orbalix",
          "email" : "aureliaharding@orbalix.com",
          "city" : "Yardville",
          "state" : "DE"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "99",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 99,
          "balance" : 47159,
          "firstname" : "Ratliff",
          "lastname" : "Heath",
          "age" : 39,
          "gender" : "F",
          "address" : "806 Rockwell Place",
          "employer" : "Zappix",
          "email" : "ratliffheath@zappix.com",
          "city" : "Shaft",
          "state" : "ND"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "119",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 119,
          "balance" : 49222,
          "firstname" : "Laverne",
          "lastname" : "Johnson",
          "age" : 28,
          "gender" : "F",
          "address" : "302 Howard Place",
          "employer" : "Senmei",
          "email" : "lavernejohnson@senmei.com",
          "city" : "Herlong",
          "state" : "DC"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "126",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 126,
          "balance" : 3607,
          "firstname" : "Effie",
          "lastname" : "Gates",
          "age" : 39,
          "gender" : "F",
          "address" : "620 National Drive",
          "employer" : "Digitalus",
          "email" : "effiegates@digitalus.com",
          "city" : "Blodgett",
          "state" : "MD"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "145",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 145,
          "balance" : 47406,
          "firstname" : "Rowena",
          "lastname" : "Wilkinson",
          "age" : 32,
          "gender" : "M",
          "address" : "891 Elton Street",
          "employer" : "Asimiline",
          "email" : "rowenawilkinson@asimiline.com",
          "city" : "Ripley",
          "state" : "NH"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "183",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 183,
          "balance" : 14223,
          "firstname" : "Hudson",
          "lastname" : "English",
          "age" : 26,
          "gender" : "F",
          "address" : "823 Herkimer Place",
          "employer" : "Xinware",
          "email" : "hudsonenglish@xinware.com",
          "city" : "Robbins",
          "state" : "ND"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "190",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 190,
          "balance" : 3150,
          "firstname" : "Blake",
          "lastname" : "Davidson",
          "age" : 30,
          "gender" : "F",
          "address" : "636 Diamond Street",
          "employer" : "Quantasis",
          "email" : "blakedavidson@quantasis.com",
          "city" : "Crumpler",
          "state" : "KY"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "208",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 208,
          "balance" : 40760,
          "firstname" : "Garcia",
          "lastname" : "Hess",
          "age" : 26,
          "gender" : "F",
          "address" : "810 Nostrand Avenue",
          "employer" : "Quiltigen",
          "email" : "garciahess@quiltigen.com",
          "city" : "Brooktrails",
          "state" : "GA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "222",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 222,
          "balance" : 14764,
          "firstname" : "Rachelle",
          "lastname" : "Rice",
          "age" : 36,
          "gender" : "M",
          "address" : "333 Narrows Avenue",
          "employer" : "Enaut",
          "email" : "rachellerice@enaut.com",
          "city" : "Wright",
          "state" : "AZ"
        }
      }
    ]
  }
}

返回参数说明:

  • took —— Elasticsearch执行这个搜索的耗时,以毫秒为单位
  • timed_out —— 指明这个搜索是否超时
  • _shards —— 指出多少个分片被搜索了,同时也指出了成功/失败的被搜索的shards的数量
  • hits —— 搜索结果
  • hits.total —— 能够匹配我们查询标准的文档的总数目
  • hits.hits —— 真正的搜索结果数据(默认只显示前10个文档)
  • _score和max_score —— 现在先忽略这些字段
    ②方法体方式: curl -XPOST ‘localhost:9200/bank/_search?pretty’ -d ‘{“query”: { “match_all”: {} } }’

注:query部分告诉我查询的定义,match_all部分就是我们想要运行的查询的类型。match_all查询,就是简单地查询一个指定索引下的所有的文档。

除了query参数,还可以指定其他参数:
1.size:返回多少条数据,不指定默认为10

[hadoop@master ~]$  curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
        {
          "query": { "match_all": {} },
          "size": 1
        }'

2.from:返回第11到第20个文档,不指定默认为0,与size结合使用分页

[hadoop@master ~]$  curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
          "query": { "match_all": {} },
          "from": 10,
          "size": 10
        }'

3.sort:排序,账户余额降序排序,返回前10个

[hadoop@master ~]$    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
        {
          "query": { "match_all": {} },
          "sort": { "balance": { "order": "desc" } }
        }'

4._source:指定返回字段,此例子只返回account_number和balance

[hadoop@master ~]$   curl -XGET 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": { "match_all": {} },
           "_source": ["account_number", "balance"]
         }'
{
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "25",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 25,
          "balance" : 40540
        }
   }

5.match:指定匹配字段查询,此例返回账户编号为20的文档

[hadoop@master ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
          "query": { "match": { "account_number": 20 } }
         }'

match:此例返回地址中包含“mill”或者包含“lane”的账户

[hadoop@master ~]$ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
        {
          "query": { "match": { "address": "mill lane" } }
        }' 

6.match_phrase:此例匹配短语“mill lane”,此时只会查询出address为mill lane的文档

[hadoop@master ~]$  curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
        {
          "query": { "match_phrase": { "address": "mill lane" } }
        }'
  1. bool:布尔查询
    bool must语句指明对于一个文档,所有的查询都必须为真,这个文档才能够匹配成功
    此例查询返回包含“mill”和“lane”的所有的账户
[hadoop@master ~]$    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": {
             "bool": {
               "must": [
                 { "match": { "address": "mill" } },
                 { "match": { "address": "lane" } }
               ]
             }
           }
        }'

bool should语句指明对于一个文档,查询列表中,只要有一个查询匹配,那么这个文档就被看成是匹配的
此例查询返回包含“mill”或“lane”的所有的账户

[hadoop@master ~]$   curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": {
             "bool": {
               "should": [
                 { "match": { "address": "mill" } },
                 { "match": { "address": "lane" } }
               ]
             }
           }
         }'

bool must_not语句指明对于一个文档,查询列表中的所有查询都必须都不为真,这个文档才被认为是匹配的

[hadoop@master ~]$  curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": {
             "bool": {
               "must_not": [
                 { "match": { "address": "mill" } },
                 { "match": { "address": "lane" } }
               ]
             }
           }
         }'

可以在一个bool查询里一起使用must、should、must_not
此例返回40岁以上并且不生活在ID(daho)的人的账户

[hadoop@master ~]$   curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": {
             "bool": {
               "must": [
                 { "match": { "age": "40" } }
               ],
               "must_not": [
                 { "match": { "state": "ID" } }
               ]
             }
           }
         }'

过滤器
先前搜索结果中的_score字段这个得分是与我们指定的搜索查询匹配程度的一个相对度量。得分越高,文档越相关,得分越低文档的相关度越低。
Elasticsearch中的所有的查询都会触发相关度得分的计算。对于那些我们不需要相关度得分的场景下,Elasticsearch以过滤器的形式提供了另一种查询功能。
过滤器在概念上类似于查询,但是它们有非常快的执行速度,这种快的执行速度主要有以下两个原因

过滤器不会计算相关度的得分,所以它们在计算上更快一些
过滤器可以被缓存到内存中,这使得在重复的搜索查询上,其要比相应的查询快出许多
此例返回balance在【20000,30000】之间的账户

[hadoop@master ~]$ curl -XGET "http://localhost:9200/bank/_search" -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}'

猜你喜欢

转载自blog.csdn.net/yangang1223/article/details/80306212
今日推荐