1 bulk语法

通过bulk语法，可以将crud所需的不同的操作放在一个语句里面。
先来查找一下看是否有数据：

查询命令为如下时：

GET /test_index/test_type/1

查询的结果是：

{
  "_index": "test_index",
  "_type": "test_type",
  "_id": "1",
  "_version": 2,
  "found": true,
  "_source": {
    "test_field1": "test field1",
    "test_field2": "test field2"
  }
}

查询命令为如下时：

GET /test_index/test_type/2

查询结果是：

{
  "_index": "test_index",
  "_type": "test_type",
  "_id": "2",
  "_version": 1,
  "found": true,
  "_source": {
    "test_content": "my test"
  }
}

说明id为2的这个数据是存在的,同时id为1的那个数据也存在。

这里写图片描述

接下来模拟，先删除，在创建等过程。使用bulk语法的特点是，当其中一个失败了之后，不影响其它的操作，其它的操作该成功还是会成功，该失败还是会失败。

注意：
1、实际的时候，要将带有#号的行去掉。
2、bulk api对json的语法，有严格的要求，每个json串不能换行，只能放一行，同时一个json串和一个json串之间，必须有换行。

POST /_bulk
id为2的数据将会被删除
{"delete":{"_index":"test_index","_type":"test_type","_id":2}}
#id为3的将会被创建
{"create":{"_index":"test_index","_type":"test_type","_id":3}}
#表示id为3的文档的内容
{"test_field":"test3"}
#创建id为2的文档
{"create":{"_index":"test_index","_type":"test_type","_id":2}}
#表示的文档的内容为test2
{"test_field":"test2"}
#创建索引，id为4的
{"index":{"_index":"test_index","_type":"test_type","_id":4}}
#这里是内容
{"test_field":"test4"}
#其实，这里面已经存在id为1的值，下面的相当于是替换
{"index":{"_index":"test_index","_type":"test_type","_id":1}}
替换成的内容是下面的内容
{"test_field":"replaced test1111"}
#下面表示将id为1的做更新操作
{ "update": { "_index": "test_index", "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} }
{ "doc" : {"test_field2" : "bulk test1"} }

即：总的操作命令为：

POST /_bulk
{"delete":{"_index":"test_index","_type":"test_type","_id":"2"}}
{"create":{"_index":"test_index","_type":"test_type","_id":"3"}}
{"test_field":"test3"}
{"create":{"_index":"test_index","_type":"test_type","_id":"2"}}
{"test_field":"test2"}
{"index":{"_index":"test_index","_type":"test_type","_id":"4"}}
{"test_field":"test4"}
{"index":{"_index":"test_index","_type":"test_type","_id":"1"}}
{"test_field":"replaced test1111","test_field2":"test_field2"}
{ "update": { "_index": "test_index", "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} }
{ "doc" : {"test_field2" : "bulk test1"} }

如果未遵循注意点2的要求，会报如下错误：

{
  "error": {
    "root_cause": [
      {
        "type": "json_e_o_f_exception",
        "reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@7c1cd2c1; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@7c1cd2c1; line: 1, column: 3]"
      }
    ],
    "type": "json_e_o_f_exception",
    "reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@7c1cd2c1; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@7c1cd2c1; line: 1, column: 3]"
  },
  "status": 500
}

这里写图片描述

经过上面的bulk操作，依次执行下面的命令：

GET /test_index/test_type/2

结果是：

{
  "_index": "test_index",
  "_type": "test_type",
  "_id": "2",
  "_version": 5,
  "found": true,
  "_source": {
    "test_field": "test2"
  }
}

执行命令：

GET /test_index/test_type/3

运行结果是：

{
  "_index": "test_index",
  "_type": "test_type",
  "_id": "3",
  "_version": 1,
  "found": true,
  "_source": {
    "test_field": "test3"
  }
}

执行命令：

GET /test_index/test_type/4

查询结果是：

{
  "_index": "test_index",
  "_type": "test_type",
  "_id": "4",
  "_version": 4,
  "found": true,
  "_source": {
    "test_field": "test4"
  }
}

执行命令：

{
  "_index": "test_index",
  "_type": "test_type",
  "_id": "1",
  "_version": 6,
  "found": true,
  "_source": {
    "test_field": "replaced test1111",
    "test_field2": "bulk test1"
  }
}

总结，有哪些类型的操作可以执行呢？

（1）delete：删除一个文档，只要1个json串就可以了
（2）create：PUT /index/type/id/_create，强制创建，需要两行，下一行表示所需的数据
（3）index：普通的put操作，可以是创建文档，也可以是全量替换文档
（5）update：执行的partial update操作

bulk操作中，任意一个操作失败，是不会影响其他的操作的，但是在返回结果里，会告诉你异常日志

2、bulk size最佳大小

bulk request会加载到内存里，如果太大的话，性能反而会下降，因此需要反复尝试一个最佳的bulk size.
一般从1000 ~ 5000条数据开始，尝试逐渐增加。另外，如果看大小的话，最好是在5~15MB之间。

ElasticSearch bulk批量增删改语法(来自学习资料 + 自己整理，第27节)

1 bulk语法

2、bulk size最佳大小

猜你喜欢