Elasticsearch---学习记录(2)

仅供自己作学习笔记,详情请移步es官方文档

9.记录------sql插件

安装sql插件以后,就有两种方式查询数据

还是url里面直接使用_sql+"sql查询语句"

curl -XPOST http://172.16.150.149:29200/_sql?pretty -d "SELECT * FROM facebook"
      {
        "took" : 1,
        "timed_out" : false,
        "_shards" : {
      "total" : 3,
      "successful" : 3,
      "failed" : 0
        },
        "hits" : {
      "total" : 4,
      "max_score" : 1.0,
      "hits" : [ {
        "_index" : "facebook",
        "_type" : "blog",
        "_id" : "pretty",
        "_score" : 1.0,
        "_source" : {
      "title" : "website",
      "text" : "blog is making",
      "date" : "2018/1016"
        }
      }, {
        "_index" : "facebook",
        "_type" : "blog",
        "_id" : "AWZ668ZcHFL4sAFl7IMI",
        "_score" : 1.0,
        "_source" : {
      "title" : "website",
      "text" : "blog is making",
      "date" : "2018/1016"
        }
      }, {
        "_index" : "facebook",
        "_type" : "blog",
        "_id" : "AWZ67I_dHFL4sAFl7IMJ",
        "_score" : 1.0,
        "_source" : {
      "title" : "website",
      "text" : "blog is making",
      "date" : "2018/1016"
        }
      }, {
        "_index" : "facebook",
        "_type" : "blog",
        "_id" : "123",
        "_score" : 1.0,
        "_source" : {
      "title" : "change version num",
      "text" : "changing...",
      "views" : 0,
      "tags" : [ "testing" ]
        }
      } ]
        }
      }

sql插件可视化界面

10.记录------GET多个文档

mget API 要求有一个 docs 数组作为参数，每个元素包含需要检索文档的元数据，包括 _index 、 _type 和 _id 。

当_index,_type相同的情况下,直接就传一个ids数组

curl -i -XGET http://172.16.150.149:29200/facebook/blog/_mget?pretty -d " {"ids":["123","888"]}"
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 504

{
  "docs" : [ {
"_index" : "facebook",
"_type" : "blog",
"_id" : "123",
"_version" : 121,
"found" : true,
"_source" : {
  "title" : "change version num",
  "text" : "changing...",
  "views" : 0,
  "tags" : [ "testing" ]
}
  }, {
"_index" : "facebook",
"_type" : "blog",
"_id" : "888",
"_version" : 1,
"found" : true,
"_source" : {
  "title" : "website",
  "text" : "new test is made",
  "date" : "2018/10/17"
}
  } ]
}

11.记录------bulk批量操作

为什么需要换行?
肯定是要从性能消耗的角度上看.以每条指令,作为一个数据源操作,直接读取,减少JVM的消耗.

bulk API 按如下步骤顺序执行：

客户端向 Node 1 -master发送 bulk 请求。

Node 1 为每个节点创建一个批量请求，并将这些请求并行转发到每个包含主分片的节点主机。

主分片一个接一个按顺序执行每个操作。当每个操作成功时，主分片并行转发新文档（或删除）到副本分片，然后执行下一个操作。一旦所有的副本分片报告所有操作成功，该节点将向协调节点报告成功，协调节点将这些响应收集整理并返回给客户端。

由这个也可以看出是bulk的操作是非原子性的.

自己遇到的问题是怎么换行,而不是续行?

在github上面看到了解决方案(自己使用ubuntu进行测试),加入-H 'Content-Type: application/json'

 curl -H 'Content-Type: application/json' -i -XPOST http://172.16.150.149:29200/_bulk -d '
{"create":{"_index":"twitter","_type":"newtype","_id":970}}
{ "create": { "_index": "user", "_type": "doc", "_id": "2" }}
'

然后就可以愉快地随意换行了,结尾注意',其实忘记输入,直接回车,也只会有另起一行.

12.了解------routing的作用

文档中讲了es的存储方式,这里就简单了解记录.

shard = hash(routing) % number_of_primary_shards

routing 是一个可变值，默认是文档的 _id ，也可以设置成一个自定义的值。 routing 通过 hash 函数生成一个数字，然后这个数字再除以 number_of_primary_shards （主分片的数量）后得到余数。这个分布在 0 到 number_of_primary_shards-1 之间的余数，就是我们所寻求的文档所在分片的位置。

这就解释了为什么我们要在创建索引的时候就确定好主分片的数量并且永远不会改变这个数量：因为如果数量变化了，那么所有之前路由的值都会无效，文档也再也找不到了。

13.记录-------空搜索

不指定查询语句

GET /_search

  curl -XGET http://172.16.150.149:29200/facebook/_search?pretty
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
  },
  "hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [ {
  "_index" : "facebook",
  "_type" : "blog",
  "_id" : "pretty",
  "_score" : 1.0,
  "_source" : {
"title" : "website",
"text" : "blog is making",
"date" : "2018/1016"
  }
}, {
  "_index" : "facebook",
  "_type" : "blog",
  "_id" : "888",
  "_score" : 1.0,
  "_source" : {
"title" : "website",
"text" : "new test is made",
"date" : "2018/10/17"
  }
}, {
  "_index" : "facebook",
  "_type" : "blog",
  "_id" : "AWZ668ZcHFL4sAFl7IMI",
  "_score" : 1.0,
  "_source" : {
"title" : "website",
"text" : "blog is making",
"date" : "2018/1016"
  }
}, {
  "_index" : "facebook",
  "_type" : "blog",
  "_id" : "AWZ67I_dHFL4sAFl7IMJ",
  "_score" : 1.0,
  "_source" : {
"title" : "website",
"text" : "blog is making",
"date" : "2018/1016"
  }
}, {
  "_index" : "facebook",
  "_type" : "blog",
  "_id" : "123",
  "_score" : 1.0,
  "_source" : {
"title" : "change version num",
"text" : "changing...",
"views" : 0,
"tags" : [ "testing" ]
  }
} ]
  }
}

主要字段含义

took:查询消耗时间.

timeout:设定一个时间来等待各个节点,分片返回的结果,过时就关闭连接.

hits:记录查询的总数信息,以及各个索引的信息_index,_type,_id等.
shards:分片信息.