The usage of bulk in elasticsearch

The last article introduced the method mget for batch reading data in es. In this article, let's take a look at the method bulk for batch writing.

The bulk api can perform multiple indexing or deletion operations at one time in a single request, which can greatly improve indexing performance.

The syntax format of bulk is:

action and meta_data \n
optional source \n

action and meta_data \n
optional source \n

action and meta_data \n
optional source \n

As can be seen from the above, two lines of data constitute an operation. The first line is the operation type, which can be index, create, update, or delete, and the second line is our optional data body. When using this method for batch insertion , we need to set its Content-Type to application/json.

For different operation types, the optional data bodies in the second line are different, as follows:

(1)index 和 create  第二行是source数据体
(2)delete 没有第二行
(3)update 第二行可以是partial doc,upsert或者是script

We can write our operations directly to a text file and send it to the server using the curl command:

The content of a requests file is as follows:

{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "field1" : "value1" }

Send the command as follows:

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo

The response result is as follows:

{"took":7, "errors": false, "items":[{"index":{"_index":"test","_type":"_doc","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}

Note that since we must have a newline character in each line, the json format can only be in one line and cannot use the formatted content. Let's look at a correct post bulk request data body:

{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "_doc", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "_doc", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "_doc", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

The results of the return operation of the bulk request are also batched, and each action will have a specific response body to tell you whether the current action is successfully executed or failed:

{
   "took": 30,
   "errors": false,
   "items": [
      {
         "index": {
            "_index": "test",
            "_type": "_doc",
            "_id": "1",
            "_version": 1,
            "result": "created",
            "_shards": {
               "total": 2,
               "successful": 1,
               "failed": 0
            },
            "status": 201,
            "_seq_no" : 0,
            "_primary_term": 1
         }
      },
      {
         "delete": {
            "_index": "test",
            "_type": "_doc",
            "_id": "2",
            "_version": 1,
            "result": "not_found",
            "_shards": {
               "total": 2,
               "successful": 1,
               "failed": 0
            },
            "status": 404,
            "_seq_no" : 1,
            "_primary_term" : 2
         }
      },
      {
         "create": {
            "_index": "test",
            "_type": "_doc",
            "_id": "3",
            "_version": 1,
            "result": "created",
            "_shards": {
               "total": 2,
               "successful": 1,
               "failed": 0
            },
            "status": 201,
            "_seq_no" : 2,
            "_primary_term" : 3
         }
      },
      {
         "update": {
            "_index": "test",
            "_type": "_doc",
            "_id": "1",
            "_version": 2,
            "result": "updated",
            "_shards": {
                "total": 2,
                "successful": 1,
                "failed": 0
            },
            "status": 200,
            "_seq_no" : 3,
            "_primary_term" : 4
         }
      }
   ]
}

There are three paths for bulk requests that are similar to the previous mget requests:

(1) /_bulk  

(2)/{index}/_bulk

(3)/{index}/{type}/_bulk

For the above three formats, if the index and type are provided, the action in the data body can be omitted. Similarly, if the index is provided but no type, then you need to add the type in the data body.

In addition, there are several parameters that can be used to control some operations:

(1) The _version field can be used in the data body

(2) The _routing field can be used in the data body

(3) The wait_for_active_shards parameter can be set, and the bulk operation is performed after the data is copied to multiple shards

(4) refresh controls how long the interval and multiple searches are visible

Finally, we will focus on the update operation. The update operation has also been introduced in the previous article. ES provides a variety of methods to update data, such as:

(1)doc
(2)upsert
(3)doc_as_upsert
(4)script
(5)params ,lang ,source

The update method in bulk is similar to that in java api. The previous article also introduced the detailed usage. Now let's see how to use it in bulk:

POST _bulk
{ "update" : {"_id" : "1", "_type" : "_doc", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_type" : "_doc", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_type" : "_doc", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_type" : "_doc", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_type" : "_doc", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}

In fact, it is unformatted content, just put it on one line and submit it. The difference is that the previous article describes a single request, and after using bulk, you can request multiple operations in batches at one time.

Summarize:

This article introduces the usage of bulk operation in es. Using bulk operation, we can insert data in batches to improve the writing performance, but the data format for different actions is different. This needs attention. A newline must be added at the end of each line of data, otherwise es will not recognize its format correctly.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325473150&siteId=291194637