[Elasticsearch] How to update documents partially (the use of partial update)


Insert picture description here
Reprinted: ES 26-How Elasticsearch Partially Updates Documents (Use of Partial Update)

1. What is partial update

1.1 The principle of fully modifying the document

Fully modify the syntax of the document: PUT index/type/1,if the document with id=1 does not exist, it will be created, if it exists, the operation of replacing the original document will occur.

The performance of fully replacing documents is relatively low. In order to avoid the occurrence of replacement operations, partial update is introduced: only the specified field is modified, without the need to modify the data in full.

1.2 The idea of ​​modifying the specified field

(1) Obtain the document to be modified according to the user's request;

(2) Encapsulate the new document submitted by the user in the memory, and send the PUT request to the ES;

(3) Mark the old document to be replaced as deleted;

(4) Finally, save the packaged new document into the index.

1.3 Advantages of partial update

(1) All query, modification and write-back operations are performed in the same shard, avoiding the overhead of network transmission.

No need: query documents from a specific shard -> return to memory -> modify in memory -> send the modified document to the original shard -> write index-this complex operation significantly improves performance.

(2) The time interval between query and modification is reduced, which can effectively reduce concurrency conflicts.

1.4 Use of partial update

Usage: Through the _update keyword to achieve incremental update:

// 添加测试数据: 
PUT employee/developer/1
{
    
    
    "name": "shou feng", 
    "sex": "male",
    "age": 20
}

// partial update修改指定field: 
POST employee/developer/1/_update
{
    
    
    "doc": {
    
    
        "age": 21
    }
}

// 响应结果: 
{
    
    
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 2,
    "result": "updated",
    "_shards": {
    
    
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}

// 查看文档, 发现age已经从20变为21了. 
GET employee/developer/1

If you don't use _update,it, it will overwrite the source document directly, causing the original document to lose some data:

// 不使用_update:  
POST employee/developer/1
{
    
    
    "doc": {
    
    
        "age": 22
    }
}

// 再次查看, 发现id=1的该文档就只剩一个age字段了: 
GET employee/developer/1

2 Partial update operation through script

ES provides script support-various complex operations can be implemented through Groovy external scripts (obsolete) and built-in painless scripts.

2.1 Built-in painless script to modify documents

Insert document:

PUT employee/developer/1
{
    
    
    "name": "shou feng", 
    "age": 20,
    "salary": 10000
}

Execution script: ---- A more light and shorter painless script is used here, which is a script directly represented by a string:

POST employee/developer/1/_update    // 发送POST请求, 执行partial update
{
    
    
    "script": "ctx._source.salary+=500"    // 为salary自增500
}

View the modified results:

GET employee/developer/1

// 结果如下: 
{
    
    
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 5,
    "found": true,
    "_source": {
    
    
        "name": "shou feng",
        "age": 20,
        "salary": 10500			// 自增500成功
    }
}

2.2 External Groovy script to modify documents

Note: After ES 6.x version, groovy script is no longer supported. The demonstration here is ES 5.6.10 version. If it is used in 6.x version, the following exception will be thrown:

"type": "illegal_argument_exception",
"reason": "script_lang not supported [groovy]"

The script files stored in ${ES_HOME}/config/scriptsunder the file name xxx.groovy, says:

ctx._source.salary+=bonus —— The increased value is close to the bonus value, the script information example is as follows:

[root@localhost scripts]# pwd
/data/elk-5.6.10/es-node/config/scripts
[root@localhost scripts]# cat change_salary.groovy 
ctx._source.salary+=bonus
[root@localhost scripts]# 

Modify the document:

POST employee/developer/1/_update
{
    
    
    "script": {
    
    
        "lang": "groovy", 
        "file": "change_salary",
        "params": {
    
    
            "bonus": 500
        }
    }
}

// 响应结果为: 
#! Deprecation: [groovy] scripts are deprecated, use [painless] scripts instead
{
    
    
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 6,
    "result": "updated",
    "_shards": {
    
    
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}

View the modified results:

GET employee/developer/1
// 结果如下: 
{
    
    
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 6,
    "found": true,
    "_source": {
    
    
        "name": "shou feng",
        "age": 20,
        "salary": 9000
    }
}

Note:
When executing an external Groovy script, ES prompts that the Groovy script is out of date. It is recommended that we use painless-a more brisk expression, that is, a similar ctx._source.salary+=bonusshort expression.
Starting from Elasticsearch 5.6, the default script is already painless For the detailed usage of the script, please check the blog post: ES 27-Elasticsearch's Painless Script Usage Practice.

2.3 Built-in painless script upsert document

(First delete the document with id=1:) DELETE employee/developer/1Assuming that we don't know that the document with id=1 has been deleted, now add "level": 1content for it:

POST employee/developer/1/_update
{
    
    
    "doc": {
    
    
        "level": 1
    }
}

[404-Document missing] error is thrown:

{
    
    
    "error": {
    
    
        "root_cause": [
            {
    
    
                "type": "document_missing_exception",
                "reason": "[developer][1]: document missing",
                "index_uuid": "rT6tChP2QISaVd2OzdCEMA",
                "shard": "3",
                "index": "employee"
            }
        ],
        "type": "document_missing_exception",
        "reason": "[developer][1]: document missing",
        "index_uuid": "rT6tChP2QISaVd2OzdCEMA",
        "shard": "3",
        "index": "employee"
    },
    "status": 404
}

Modify the upsert strategy: If the specified document does not exist, perform the initialization operation in upsert; if it exists, perform the partial update operation in doc or script:

POST employee/developer/1/_update
{
    
    
    "script": "ctx.source.level+=1",
    "upsert": {
    
    
        "name": "heal",
        "age": 20
    }
}

At this point, I found "result": "created"-a new document was created.

2.4 External Groovy script delete document

Description: The demo here uses ES 5.6.10 version.
Script path: ${ES_HOME}/config/scripts/delete_doc.groovy
Script content:ctx.op = ctx._source.age == age ? 'delete': 'none' ctx.op = ctx._source.age == param ? 'delete' : 'none'

Usage example:

POST employee/developer/1/_update
{
    
    
    "script": {
    
    
        "lang": "groovy", 
        "file": "delete_doc",
        "params": {
    
    
            "age": 20	// 如果年龄是20, 则删除之
        }
    }
}

Response result:

#! Deprecation: [groovy] scripts are deprecated, use [painless] scripts instead
{
    
    
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "_version": 13,
    "result": "deleted",
    "_shards": {
    
    
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}

Check if the document is deleted:

GET employee/developer/1
// 响应结果 - 成功删除: 
{
    
    
    "_index": "employee",
    "_type": "developer",
    "_id": "1",
    "found": false
}

3 Concurrency control strategy of partial update

Partial update also internally controls concurrency through optimistic locking. For
concurrency control, please see the blog post: Elasticsearch's Concurrency Control Strategy.

3.1 Control method

POST index/type/id/_update?retry_on_conflict=5
POST index/type/id/_update?retry_on_conflict=5&version=5

3.2 Retry principle

retry_on_conflict: The number of retries after a conflict.

(1) Clients A and B obtain the same document almost at the same time, and obtain the _version version information together, assuming that _version=1 at this time;

(2) Client A modifies part of the content in the document and writes the modification into the index;

(3) When Elasticsearch is writing to the index, it checks the version information of the document submitted by client A (here it is still 1) and the version information of the existing document (here is also 1), and after finding the same, executes the write operation and modify it Version number_version=2;

(4) Client B also modifies part of the content in the document, and its operation writes back to the index at a slightly slower speed. At this time, the same process is executed (3): ES finds that the version of the document submitted by client B is 1, while the existing document is Version 2 ===> conflict occurs, this partial update will fail;

(5) After the partial update operation fails, the process (1)-(3) will be repeated, and the number of repetitions is the value of the retry_on_conflict parameter.

Guess you like

Origin blog.csdn.net/qq_21383435/article/details/108902973
Recommended