Elasticsearch series --- incremental update principles and advantages

Overview

This part describes the incremental update (partial update, also known as partial update) core principles, Elasticsearch 6.3.1 version of the script using the example of the advantages and incremental updates.

Incremental update process and principles

Brief review

We have previously introduced a simple incremental syntax, a brief review request example:

POST /music/children/1/_update
{
  "doc": {
    "length": "76"
  }
}

General from the client to Elasticsearch, basic integrity of the application request process is as follows:

  • The client first initiates GET request to obtain the document information to show on the front page for users to edit.
  • After the user to edit the data, click Submit.
  • Background data processing system modified and assembled the complete document packets.
  • PUT request sent to the ES, the amount of full replacement.
  • ES The old document is marked as deleted, and then re-create a new document.

Elasticsearch The document is designed based on immutable mode, all document updates, in fact, create a new document out, then the old document is marked as deleted, incremental update is no exception, just GET the full amount of document data, integration of new the document, replacing the old document this three-step operation to complete in full in a shard, millisecond to complete.

Incremental update interaction between fragments

12

An incremental update procedure document is:

  1. Java client sends a request to update ES cluster.
  2. Coodinate Node receives the request, the document but not on the current Node, it forwards the request to the node P0 shard Node2.
  3. Node 2 to retrieve document, modify JSON under _source, and re-index the document, if there are other threads modified the document, there are conflicting versions of it, will again try to update the document, the maximum retry retry_on_conflict times, after exceeding the number of retries to give up .
  4. If step 3 is successful, Node2 forwards the full content of the document Node1 and asynchronous replica shard Node3 re-indexing. Once all replica return success, Node2 returns a success message to the Coodinate Node.
  5. Coodinate Node response to the update success message to the client, this time in the ES cluster primary shard and replica shard have been updated.
Note that:
  • primary shard document data to be synchronized replica shard, the transmission is complete information document, because asynchronous request does not guarantee an orderly, incremental information if the hair, then the order will lead to confusion document content errors.
  • As long as Coodinate Node response to the Java client is successful, it means that all of the primary shard to the replica shard have completed the update operation, then the data in the ES cluster is consistent update is safe.
  • retry strategy, ES reacquired document data and the latest version number, the update success, fail and try again, you can set the maximum number, such as 5: retry_on_conflict = 5
  • retry strategy is more suitable in the scene incremental operating unrelated sequence, such counting operation after the implementation of who should perform, little to do, the end result is on the line. Some other scenarios, such as changes in inventories, account balance changes, a direct update to the specified value, and certainly can not be used retry policies, but can be converted into addition and subtraction, as when a single update inventory directly into a number of logical "current the number of available inventory = inventory number - the number of orders for goods ", update account balance plus or minus the amount of change, so that you can to a certain extent, independent of the order of about transformed into order, it can be more convenient to use problem-solving strategies retry the conflict.

The advantage of incremental updates

  1. All of the query, modify and write-back operations are completed within the ES, reducing network data transmission overhead (2) to improve performance.
  2. Alternatively compared to the total amount of the time interval (in seconds or more), the query and modify shorter interval (in milliseconds), can effectively reduce concurrency conflict.

Use a script to achieve incremental update

Elasticsearch support the use of scripts to achieve more flexible logic, since version 6.0, the default script support is painless, and no longer supports Groovy, Groovy compiler because there is a certain probability that there will be no release of memory, eventually leading Full GC problems.

Our case English songs as the background, assuming document data like this:

{
  "_index": "music",
  "_type": "children",
  "_id": "2",
  "_version": 6,
  "found": true,
  "_source": {
    "name": "wake me, shark me",
    "content": "don't let me sleep too late, gonna get up brightly early in the morning",
    "language": "english",
    "length": "55",
    "likes": 0
  }
}

Built-in scripting

Now there is such a demand: Every time someone clicks to play a song, the likes field on the document is incremented by one, we can use a simple script to achieve:

POST /music/children/2/_update
{
   "script" : "ctx._source.likes++"
}

After performing once again to query the document, found that likes to become one, each performed once, likes are incremented by one, the results in line with expectations.

External Screenplay

Just make some changes to the increment demand, support batch updates playback volume, from the growing number of parameters passed, the script can also be introduced by way of pre-compiled when stored in the ES, used to call.

Create a script
POST _scripts/music-likes
{
  "script": {
    "lang": "painless",
    "source": "ctx._source.likes += params.new_likes" 
  }
}

Script ID for the music-likes, parameters new_likes, can be passed when calling.

Using a script

When we update, execute the following request, you can call the script just created

POST /music/children/2/_update
{
  "script": {
    "id": "music-likes", 
    "params": {
      "new_likes": 2
    }
  }
}

id is created when the script music-likes, params are fixed wording, inside the parameters for the new_likes, the implementation of information and then view the document, you can see the value of accumulated likes field passed by value, results in line with expectations.

View Script

command:

GET _scripts/music-likes

That script parameter following the slash ID

Removal script

command:

DELETE _scripts/music-likes

That script parameter following the slash ID

Script Notes

  • ES when it detects a new script, the script will execute compile and store it in the cache, compile time-consuming.
  • Scripting can be parameterized, do not hard-coded to improve reusability of the script.
  • Too many scripts have compiled a short time, if the bear beyond the scope of the ES, ES directly reported circuit_breaking_exception error, the default range is 15 / min.
  • Script cache 100 default, no default expiration time, each script character maximum 65535 bytes, want self-configuring, then you can change script.cache.expire, script.cache.max_size and script.max_size_in_bytes parameters.

In short, to improve the reusability of the script.

upsert grammar

Just like the case, the implementation is a function of the counter play, the current counter is stored together with the content, if the counter is stored separately, there may be a new song shelves, but the counter document may not exist, attempt to it does update will be reported document_missing_exception wrong, we need to use this scenario upsert syntax:

POST /music/children/3/_update
{
   "script" : "ctx._source.likes++",
   "upsert": {
     "likes": 0
   }
}

If the record id 3 is absent, the first request, performs the content upsert JSON inside, a new initialization file, ID of 3, Likes is 0; when a second request, the document already exists, then script do update script, likes increment.

summary

Benpian simply describes the process and the principle of incremental updates, and with the full amount of the replacement to do a simple comparison, for the simple counting scene, introducing implementation case the script, the script can be implemented very feature rich details, refer to the official website of the Painless introduction.

High focus on Java concurrency, distributed architecture, more dry goods share technology and experience, please pay attention to the public number: Java Architecture Community
Java Architecture Community

Guess you like

Origin www.cnblogs.com/huangying2124/p/11986903.html