Chapter 17: In-depth understanding etcd: etcd Performance Optimization Practice

Depth understanding etcd: etcd Performance Optimization Practice

This article will share the following five areas:

  1. etcd before the lesson Review Review;
  2. Understanding etcd performance;
  3. etcd -server end performance optimization;
  4. etcd -client end performance optimization.

etcd before the lesson Review Review

etcd born in CoreOs company, using Golang language development, KeyValue is a distributed storage engine. We can use etcd as a distributed system metadata stored in a database, store important meta information inside the system. etcd also widely used by major companies.

The figure below shows the basic architecture etcd:

avatar

As indicated above, a cluster of three nodes: a two Leader and Follower. Each node Raft algorithm synchronization data, and stores data through boltdb. When a node hang up, the other nodes will be automatically elected a Leader, maintaining high availability features of the entire cluster. Client request can be completed by connecting a node arbitrarily.

Understand etcd performance

First we look at a map:

avatar

The figure is a standard etcd cluster architecture diagram. Etcd cluster may be divided into several core components: Raft layer e.g. blue, red Storage layer, the inner layer is divided into treeIndex Storage boltdb layer and underlying persistent store key / value layer. They each layer may have resulted in loss of performance of etcd.

First look Raft layer, network synchronization data need Raft, RTT and / IO bandwidth is between the network node affect the performance of etcd. In addition, WAL influence has also been written to disk speed IO.

Look at Storage layer, disk IO fdatasync etcd delay will affect the performance index layer lock block also affect the performance of etcd. In addition, boltdb Tx locks and performance boltdb itself will also greatly affect the performance of etcd.

From another point of view, where the host kernel parameters etcd grpc api and retardation layers, but also will affect the performance of etcd.

-server end performance optimization etcd

Here to tell us about the specific performance optimization etcd server side.

etcd server performance optimization - hardware deployment

end server requires sufficient CPU and Memory on hardware to protect the run etcd. Secondly, as a disk IO is very dependent on the database program, etcd IO latency and throughput needs very good ssd hard disk, etcd is a distributed key / value storage systems, network conditions, it is also very important. Finally, deployment, we need to try to separate it deployed to prevent interference to the performance of other programs will host the etcd.

Interested small partners can click on the following link for etcd official recommended configuration requirements information:

https://coreos.com/etcd/docs/latest/op-guide/hardware/

etcd server performance optimization - Software

etcd software is divided into many layers, the following briefly described performance optimization according to different levels. Want to know the depth of their own students can access the following GitHub pr to get the specific code changes.

  • The first is specific to etcd the memory index layer optimization: optimize the use of internal locks to reduce waiting time. The original implementation is traversed using internal primer BTree relatively coarse internal lock granularity, the lock etcd largely affects the performance, the new optimization reduces the effect of this part of the delay is reduced.

Specific reference to the following links: https://github.com/coreos/etcd/pull/9511

  • For use in optimizing the size of the lease: the lease revoke and optimization algorithms expired, failure to traverse the original list from time complexity O (n) down to O (logn), solves the problem of large-scale use of the lease.

Specific reference to the following links: https://github.com/coreos/etcd/pull/ 9418

  • Finally, for use in a rear end boltdb ** ** optimization: the rear end of the batch size limit / interval adjustment, can be dynamically configured so that different hardware and workload, these parameters are fixed before a conservative value.

Specific reference to the following links:
https://github.com/etcd-io/etcd/commit/3faed211e535729a9dc36198a8aab8799099d0f3

  • Another point is optimized by Google engineer fully concurrent read features: call boltdb tx optimize the use of read-write locks, improve read performance.

Specific reference to the following links: https://github.com/etcd-io/etcd/pull/10523

etcd freelist internal storage allocation based on a new algorithm for recycling segregated hashmap

Other performance optimization is also very much, here we focus on what a performance contributed by Ali optimization. This performance optimization greatly improves performance etcd internal storage, and its name is called: Recycling new algorithm etcd internal memory freelist distribution segregated hashmap based.

CNCF article:
https://www.cncf.io/blog/2019/05/09/performance-optimization-of-etcd-in-web-scale-data-scenario/

avatar

FIG etcd is on a single node architecture, as internal boltdb persistent storage all key / value, and therefore the performance is good or bad performance boltdb etcd quality plays a very important role. Ali internally, we use a lot of etcd as an internal storage of metadata, in the course of the performance problems we found boltdb to share to everyone here.

avatar

The image above is a core algorithm etcd internal allocation of recovered memory, here we give some background knowledge. First, etce internal use default page size is 4KB to store data. As shown in figures indicate that the page ID, red indicates that the page is being used, are not used for the white.

When the user wants to delete the data, etcd not put this storage space is immediately returned to the system, but retained inside the first up and maintain a pond page to enhance the performance of next use. This is called the freelist pool page, shown in FIG., The page ID of the freelist 43, 45, 46,50,53 are being used, the page ID is in an idle state 42,44,47,48,49,51,52.

When a new data storage needs of a continuous configuration page 3, the old algorithm needs to start the scan head from the freelist, and finally return to the starting page ID 47, can be seen in this conventional linear scan inside the freelist etcd algorithm, under a large amount of data or internal fragmentation severe cases, the performance will drop sharply.

To solve this problem, we designed and implemented a distribution based on the new freelist segregated hashmap collection algorithm. The algorithm continuously page size as a key hashmap, value is the ID of the starting set of configurations. When a new page is stored, we just need the time complexity of O (1) to query this hashmap value, quickly get the start ID of the page.

Then go to the above example, when the size needed for the continuous page 3 of the time by querying this hashmap will be able to find quickly start page with ID 47.

Also in the release page, we also used the hashmap do optimization. For example, a page ID of FIG. 45, when the release time, it can be done by forward and backward merged to form one large continuous pages, i.e. form a starting page ID 44, the page size is a continuous 6.

To sum up: The time complexity of the new algorithm Allocation of from O (n) to O (1), recovering from the optimization O (nlogn) to O (1), etcd internal memory read and write performance is no longer limited in the real scene, to optimize its performance a few times. It can be expanded from a single cluster recommended storage 2GB to 100GB. The optimization of the internal Ali is currently in use, and output to the open source community.

Mention that this optimization software say more here, etcd in will be in the new version are released, we can look to use it.

etcd end performance optimization -client

Again introduce best practices on the use of performance etcd clients.

First, let's look at etcd server to the client to provide a few API: Put, Get, Watch, Transactions, Leases many operations.

avatar

For operations in more clients, we summarize a few best practices call:

  1. Put operations directed to avoid the use of large value, streamline to streamline further simplify, for example, under K8s use crd;
  2. Secondly, etcd itself that are not suitable for storage and frequent changes of key / value metadata information. So the client in the use of the need to avoid frequent changes to create a key / value. This is for example the heartbeat data for the new node node under K8s upload it followed this practice;
  3. Finally, we need to avoid creating a large number of lease, try to choose reuse. For example, in K8s, event data management: the same TTL expiration time of event also will choose a similar lease multiplexed, rather than creating a new lease.

Finally, please remember one thing: keeping clients to use best practices to ensure that your cluster etcd stable and efficient operation.

This section summarizes

This section is to end here, as we summarize here:

  • First, we understand the etcd performance background, understand the potential performance bottleneck point from principle behind;
  • Analytical etcd server side performance optimization, optimization in terms of hardware / deployment / inner core software algorithm;
  • Learn etcd client use of best practices;

Lastly, I hope you students reading this section, can be harvested, run a stable and efficient etcd cluster for you to help. I hope you continue to focus on the lower section of wonderful courses.

Guess you like

Origin www.cnblogs.com/passzhang/p/12556525.html