Elasticsearch: How is data written?

In my previous article " Elasticsearch: How Indexing Data Is Done ", I detailed how Elasticsearch data is indexed. In today's article, I will explain how to write data to Elasticsearch from another perspective. For more information about Elasticsearch data manipulation, please read the article " Elasticsearch: A thorough understanding of Elasticsearch data manipulation ".

When a write request is sent, it goes through our routing mechanism. This process helps to determine which replication group (the primary shard and its replica shards are collectively called a replication group, please read the article " Elasticsearch: Replication - replication " for details) to store the current document.

When writing a request, there is a difference. Write requests are sent directly to the primary shard, not to any shard of the replication group. Write operations are handled by this primary shard.

The first job of the primary shard is to validate the request. It checks the structure of the request and validates field values. For example, attempting to enter a field of type Object into a field that should only contain numbers will result in a validation error on the primary shard.

Once the request passes the validation phase, the primary shard starts writing locally. It ensures that data is correctly written and updated in its own replica shards.

In this way, Elasticsearch ensures that data is accurately and efficiently written to the relevant shards in the replication group.

 

To improve performance, primary shards send operations to their replica shards in parallel. It's important to remember that even if it fails to replicate to a replica shard, the operation will still succeed.

So, in short, when a write request is received, it is directed to the primary shard. The primary shard validates the operation, performs it locally, and then distributes it to its replica shards (if applicable).

That's a high-level overview of the write model, but let's take a deeper look at how Elasticsearch handles data copy errors.

Since Elasticsearch is distributed, many operations occur asynchronously, so problems such as hardware failures may be encountered. Unfortunately, these failures can occur at any time and cause major outages. Let me give an example to demonstrate this:

Suppose we received a new document to be indexed. The primary shard handles operational validation and local document indexing. We have two replica shards in our replication group, and the primary shard passes operations to them.

Unfortunately, the primary shard failed before the operation reached one of the replica shards, probably due to a hardware issue.

When this happens, Elasticsearch starts the recovery process. I won't go into the intricate details of this process because it's pretty complicated. However, I can tell you that it involves promoting one of the replica shards to become the new primary, since every replication group needs a primary.

 

This is where the problem arises. The two remaining shards do not have the same state because one successfully indexed new documents while the other missed.

Replica shards that have not received index operations are mistakenly considered up-to-date when they are not.

As you can imagine, things start to get a little weird from this point on. A new document will only have a 50% chance of being searched (because replica B2 does not contain it), depending on which shard serves the request to retrieve it. This is a very messy situation!

This is just one of the potential pitfalls that can occur. In fact, many things can go wrong at any given time. While the likelihood of such events occurring is relatively low, it is critical that Elasticsearch prepares for them. This becomes even more important as the size of the cluster grows and the number of nodes increases and the amount of data written to the index increases.

Fortunately, Elasticsearch addresses these challenges and several others with a feature called "Primary Terms and Sequence Numbers - Primary Terms and Sequence Numbers". Without delving into the technical intricacies, let me provide you with a general understanding of these concepts.

 Primary terms are a way for Elasticsearch to distinguish between old and new primary shards when the primary shards of a replication group change.

Primary terms for a replication group are essentially just counters for the number of times the primary shard has changed.

In the example you just saw, the primary of the replication group would increase by 1 because the primary shard failed and one of the replica shards was promoted to be the new primary.

 

 

The primary terms for all replication groups are kept in the state of the cluster. What happens as part of the write operation is that the current primary terms are appended to the operation sent to the replica shard. This enables replica shards to tell if the primary shard has changed since the forwarding operation.

While this mechanism helps Elasticsearch avoid certain problems, it is not enough by itself. In addition to associating each operation with primary terms, each operation is assigned a sequence number. This sequence number acts as a counter that is incremented with each operation, at least until the primary shard changes.

The primary shard is responsible for incrementing the sequence number when processing write requests. Sequence numbers help Elasticsearch track the order in which operations occurred on a particular primary shard.

Primary terms and sequence numbers enable Elasticsearch to recover from changes in the primary shard, for example due to network errors.

Instead of comparing data on disk, it can use major terms and sequence numbers to determine which operations have completed and which are necessary to update a particular shard.

However, comparing millions of operations can be problematic when using large indexes, especially when indexing and querying data at high speeds. To speed up this process, Elasticsearch keeps global and local checkpoints, which are just sequence numbers.

Each replication group has a global checkpoint and each replica shard has a local checkpoint. A global checkpoint is a sequence number reached or exceeded by all active shards in a replication group. This means that any operation with a sequence number lower than the global checkpoint has already been performed on all shards in the replication group.

If a primary shard fails and rejoins the cluster later, Elasticsearch only needs to compare operations it knows have occurred since the last global checkpoint. Similarly, if a replica shard fails, only operations with sequence numbers greater than its local checkpoint must execute on return.

Essentially, instead of looking at the full history of the replication group, Elasticsearch only needs to examine what happened when the shard became unrecoverable. Of course, there are additional complexities to this process, but I don't want to bore you.

Now that you understand the basics of major terms, serial numbers, and checkpoints, you should be able to follow along pretty well. Remember that the primary term and sequence number are available both in response to write requests and when accessing the document using its ID.

Guess you like

Origin blog.csdn.net/UbuntuTouch/article/details/131040629