One article teaches you how to update Elasticsearch (Updata and Updata by Query)

Preface

Among Elasticsearch operations, the most commonly used are search and update operations. I introduced Elasticsearch's search API before, so today we will introduce the update operation of Elasticsearch.

API update

The update API of Elasticsearch is Update, the _update method in the url, and the Update method supports script update and content update

The update API allows documents to be updated based on the provided script. This operation retrieves documents from the index (collocated with shards), runs scripts (using optional scripting language and parameters), and indexes the results (delete or ignore operations are also allowed). It uses version control to ensure that no updates occur during the "get" and "reindex" periods.

Please note that this operation still means a complete re-indexing of the document, it just removes some network round trips and reduces the possibility of version conflicts between get and index. The _source field needs to be enabled for this feature to work properly.

For example, let's index a simple document:

PUT test/_doc/1
{
    
    
    "counter" : 1,
    "tags" : ["red"]
}

Below we will use this document as an example to perform some common update operations

Script update

Now, we can execute a script that increments the counter:

POST test/_doc/1/_update
{
    
    
    "script" : {
    
    
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
    
    
            "count" : 4
        }
    }
}

We can add a tag to the tag list (tags) (note that if the tag exists, it will still add it because it is a list):

POST test/_doc/1/_update
{
    
    
    "script" : {
    
    
        "source": "ctx._source.tags.add(params.tag)",
        "lang": "painless",
        "params" : {
    
    
            "tag" : "blue"
        }
    }
}

Note: ctx._source is used to represent the current document, followed by document attributes, you can modify the document attributes, such as adding counter, adding tags, etc.

In addition to _source, the following parameters also exist in the cxt map: _index, _type, _id, _version, _routing and _now (current timestamp)

Add new fields to the document:

POST test/_doc/1/_update
{
    
    
    "script" : "ctx._source.new_field = 'value_of_new_field'"
}

new_field is the name of the new field
value_of_new_field is the initial value of the field

Delete a field:

POST test/_doc/1/_update
{
    
    
    "script" : "ctx._source.remove('new_field')"
}

In addition to simple updates, we can also use scripts to perform some complex update operations, such as the following example, if the tags field contains green, this example will delete doc, otherwise it will not perform any operations (noop)

POST test/_doc/1/_update
{
    
    
    "script" : {
    
    
        "source": "if (ctx._source.tags.contains(params.tag)) { ctx.op = 'delete' } else { ctx.op = 'none' }",
        "lang": "painless",
        "params" : {
    
    
            "tag" : "green"
        }
    }
}

ctx.op ='delete' means to delete the data in the document
ctx.op ='none' means no operation

Update with partial documentation

In addition to using scripts to update, update also supports updating using part of the document content.
The update API supports the transfer of part of the document, which will be merged into the existing document. To completely replace an existing document, the index API should be used. The following is a field value in the update document:

POST test/_doc/1/_update
{
    
    
    "doc" : {
    
    
        "name" : "new_name"
    }
}

Note: If the document does not have the name field, then the name field will be added first, and then new_name will be assigned to it

Multiple field updates

   {
    
    
    
      "doc" : {
    
    
        "new_field " : "new",
        "name":"m"
    }

doc is equivalent to redefining the document. After executing doc, the data in the document will become the data defined in doc, so we can update and add data directly in doc

Note: If both doc and script are specified, doc will be ignored.

By default, updates that do not change anything will detect that they have not changed anything and return "result": "noop", as shown below:

POST test/_doc/1/_update
{
    
    
    "doc" : {
    
    
        "name" : "new_name"
    }
}

If name is new_name before sending the request, the entire update request is ignored. If the request is ignored, the result element in the response will return noop.

{
    
    
   "_shards": {
    
    
        "total": 0,
        "successful": 0,
        "failed": 0
   },
   "_index": "test",
   "_type": "_doc",
   "_id": "1",
   "_version": 6,
   "result": "noop"
}

parameter

The update operation supports the following query string parameters:

retry_on_conflict	在更新的get和indexing阶段之间,另一个进程可能已经更新了同一文档。 默认情况下,更新将因版本冲突异常而失败。 retry_on_conflict	参数控制在最终抛出异常之前重试更新的次数。

routing		路由用于将更新请求路由到正确的分片,并在更新的文档不存在时为upsert请求设置路由。 不能用于更新现有文档的路由。

timeout		超时等待碎片变为可用。

wait_for_active_shards	在继续更新操作之前需要处于活动状态的分片副本数。 详情请见此处。

refresh		控制何时此请求所做的更改对搜索可见。 看?refresh。

_source		允许控制是否以及如何在响应中返回更新的源。 默认情况下,不会返回更新的源。 请参阅源过滤了解详细信息

version		更新API在内部使用Elasticsearch的版本控制支持,以确保在更新期间文档不会更改。 您可以使用
version参数指定仅在文档版本与指定版本匹配时才更新文档。

Updata by Query API

Updata API is to update a single document. In many cases, we need to update multiple documents. At this time, we need to use the Updata by Query API. Updata by Query API will update all documents that meet the conditions

The corresponding method of Updata by Query API is _update_by_query

The simplest usage of _update_by_query is to perform an update on each document in the index without changing the source. This is useful for getting new attributes or some other online mapping changes.

Example:

POST twitter/_update_by_query?conflicts=proceed

Similar results:

{
    
    
  "took" : 147,
  "timed_out": false,
  "updated": 120,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    
    
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until_millis": 0,
  "total": 120,
  "failures" : [ ]
}

_update_by_query takes a snapshot of the index when the index is started, and uses internal version control to index its content. This means that if the document changes between the time the snapshot was taken and the index request is processed, there will be a version conflict. When the versions match, the document will be updated and the version number will be incremented.

Note: Since internal version control does not support the value 0 as a valid version number, _update_by_query cannot be used to update documents with version equal to zero, and the request will fail.

All updates and query failures will cause _update_by_query to abort, and return when the response fails. The updates that have been performed still exist. In other words, the process will not be rolled back, it will only be aborted. When the first failure causes an abort, all failures returned by the failed batch request will be returned in the failure element; therefore, there may be quite a lot of failed entities.

If you just want to simply calculate version conflicts without causing _update_by_query to abort, you can set conflicts=proceed in the URL or set "conflicts": "proceed" in the request body. , This parameter is used in the above example.

We can use the query DSL to limit the update range of _update_by_query

As follows, _update_by_query will only update documents whose user is kimchy, not all documents

POST twitter/_update_by_query?conflicts=proceed
{
    
    
  "query": {
    
                       // 1
    "term": {
    
    
      "user": "kimchy"
    }
  }
}

_update_by_query also supports scripts to update documents

As follows, we add 1 to the likes of all documents whose user is kimchy

POST twitter/_update_by_query
{
    
    
  "script": {
    
    
    "source": "ctx._source.likes++",
    "lang": "painless"
  },
  "query": {
    
    
    "term": {
    
    
      "user": "kimchy"
    }
  }
}

_update_by_query is similar to _update, but _update_by_query can specify the scope of our update through DSL query

Example:

Use the script method to add a new field to all documents under one of our indexes, such as:

POST twitter/_update_by_query
{
    
    
  "script": {
    
    
    "source": "ctx._source['contact'] = \"139111111111\""
  }
}

Through the above method, all documents in our twitter index add a new field contact and give it the same value

Guess you like

Origin blog.csdn.net/qq_36551991/article/details/110096269