Secrets of large data storage partitions

Partition, also known as fragmentation, is a common solution to solve a large data storage, large data storage capacity than the storage limit of a single node, it is necessary to partition stores data on a different node, each usually a single partition can be understood as a small database, even though the database can support multiple partitioning; partition introduced the concept of multi-partition, external services can simultaneously improve performance.

Concepts are often mentioned together and partition copy, usually in combination with partition replication Using such that each partition of copies are stored on multiple nodes. This means that even if each record belongs ⼀ partition, it still can be stored on several different nodes to achieve fault tolerance eyesight. In many technical or partition frame is reflected, for example, under the MQ message partition topic implemented as kafka the partion, rocketmq the queue and the like; for example, the SQL / NoSQL partition data storage implemented as ElascticSearch fragment of Shards , MySQL in the points table and so on.

Several major discussions under way on the partition, this partition key, partition rebalancing strategy and request routing processing mechanism, and finally to ES (ElascticSearch) query request processing, for example, analyze the query request processing flow under the partition. Man of few words said, Let's Go ~

Several key way partition

If you store a lot of data needs to be spread, how to partition it? The current zoning is to balance data scattered in various nodes, so while also distributed processing requests for data, if the partition is not balanced, it will cause some partition has a lot of data or queries, it is often said of the tilt. Data skew may cause a high load node formation of hot spots, avoid hot spots random routing methods can be used to hash data to the respective partitions. To partition the data, not just random data storage, certain to be because after storing the query, so to be a hash key according to a fixed partitioning, to facilitate subsequent routing query request. Common methods are in accordance with the partition key range partitioning, in accordance with the hash key partition:

According to range partitioning

According to range partitioning is that each partition specified period of continuous storage of data, such as time stamps to store data in accordance with, the simplest common in chronological log is divided into different files; in accordance with the number id to store data, such as library books on display, consecutive numbering data stored on the same shelf. According to range partitioning sometimes cause uneven regional data, such as in accordance with a time stamp, it may be less certain period of time data more certain period of time and cause data partition is not balanced.

Key hash partitioning

Since the partitioning in accordance with the scope of data likely to cause load imbalance, the general application scenario (non-sequential type data) to avoid the risk of deflection and hot ⻛ to determine if a given key partitioning prints using a hash function. A good hash function will try to randomly partition within many languages have built-in hash function, but some scenes may not be suitable partition, such as Java's Object.hashCode () and Ruby Object # hash, with its keys ⼀ You may have different hash values in a different process.
With the right hash function, sometimes want to make hash of the data within a certain range of distribution in the same partition, then you can use consistent hashing, consistent hashing may be reduced because the partition will cause changes in the existing data partition mapping of influence.

Hot Issues

Hash partitioning can help reduce hot spots, but can not be avoided, there may be an extreme case all requests have hit the same partition. Hot partition idea is to solve the problem: one is to re-focus partition partition operations, such as a hot key for data re-routing dispersed multiple partitions; there is a hot data to increase redundancy (ie copy), an increase of hot data copy nodes together to provide services.

Partition rebalancing

Over time, the data partition will have the following changes:

  • Query throughput increases, so you want to add more CPU to handle the load.
  • Data collection zoomed ⼩ increase, so you want to add more RAM to disk and store it.
  • Machine fails, another machine need to take over responsibility for the failure of the machine.

All of these changes require data and requests from ⼀ mobile node to node for another time. The move from cluster nodes ⼀ for another time to process nodes is called load rebalancing (reblancing), rebalancing the following general requirements: rebalancing possible after the equalized data, to partition the outer service normally at equilibrium, move data between nodes is only necessary to speed up the rebalancing schedule. (Generally used as modulo hash partitions embodiment requires most of all data and then re-partition modulo rebalancing, large cost.)

A fixed number of partitions

To avoid the expansion of the partition rebalancing operations, the nodes can be created ⽐ more partitions, and the plurality of partitions assigned to each node. For example, when running a 10 nodes on a cluster of databases may start from ⼀ split into 1000 partition, so zoomed about 100 partition is assigned to each node. For example, ES is to use this way of rebalancing, ES in shards slice at run time can not be changed, so the production environment generally recommend against setting the number of partitions to stay a certain margin, to facilitate subsequent expansion operation. In this case, the number of partitions will not change, just the knowledge partition data moves between nodes, the key specified zoning will not change. ⼀ the only change is the node where the partition. This change is not immediate, in the amount zoomed Open networks transmit data takes some time, so during transmission, the original partition will still accept the read and write operations. As shown below:

Dynamic Partitioning

For scenario key range, a fixed number of boundaries having a fixed partition comes in handy inconvenience: if an error occurs boundary, it may cause all the data in all data ⼀ partitions or other partition is empty. The watch will automatically re-configure the partition boundaries comes in handy cumbersome. Thus, the range of key databases (e.g. HBase and RethinkDB) dynamically partitioned to create partitions. When the partition ⻓ increased
when zoomed to ⼩ disposed over (in HBase, the default value is 10GB), it is divided into two partitions, each partition of the data about halfway. In contrast, if the amount of data to be deleted and zoomed partition zoom out to a threshold value, it can be similar to the process of merging adjacent partitions, similar to the B-tree.
⼀ advantage of dynamic partitions is adapted to the number of partitions of the total amount of data. If only a small amount of data, a small number of partition is an ample supply, so the overhead is ⼩; if zoomed amount of data, zoomed ⼩ each partition is limited ⼀ configurable, the maximum values, triggering the partition when the threshold is exceeded operating.

When rebalancing action triggers, in the end should be triggered by man-made or triggered automatically by the program do? The program automatically triggered, usually detection node load is too high or (discovered through network heartbeat) a node hung up, automatic rebalancing may be because of the impact on the implementation of some of the external environment, and may not reach our expectations, therefore, a reasonable solution is found to be executed automatically when the re-balancing and to be alert to notify maintenance personnel, manual intervention by the rebalancing processing subsequent execution.

Request routing processing

当处理请求时,如何确定哪个节点执行呢?随着分区再平衡,分区对节点的分配也发生变化,为了回答这个问题,需要有⼈知晓这些变化:如果我想读或写键“foo”,需要连接哪个节点IP地址和端⼝号?这个问题本质上就是服务发现,它不仅仅体现在数据库,任何网络通信场景都有这个问题,特别是如果它的⽬标是⾼可⽤性(在多台机器上运⾏冗余配置),都需要服务发现。概括来说,请求路由处理,有以下几种处理方案:

  1. 允许客户联系任何节点(例如,通过循环策略的负载均衡(Round-Robin Load Balancer))。如果该节点恰巧拥有请求的分区,则它可以直接处理该请求;否则,它将请求转发到适当的节点,接收回复并传递给客户端。
  2. ⾸先将所有来⾃客户端的请求发送到路由层,它决定了应该处理请求的节点,并相应地转发。此路由层本身不处理任何请求;它仅负责分区的负载均衡。
  3. 要求客户端知道分区和节点的分配。在这种情况下,客户端可以直接连接到适当的节点,⽽不需要任何中介代理。

以上所有情况的关键问题是,做出路由决策的组件(可能是节点之一、客户端或者路由代理)如何知道分区-节点之间的映射关系。映射关系可以使固定写死在代码中,也可以是配置在配置中心中。许多分布式数据系统都依赖于⼀个独⽴的协调服务,⽐如ZooKeeper来跟踪集群元数据。 每个节点在ZooKeeper中注册⾃⼰,ZooKeeper维护分区到节点的可靠映射。 其他参与者(如路由层或分区感知客户端)可以在ZooKeeper中订阅此信息。 只要分区分配发⽣的改变,或者集群中添加或删除了⼀个节点,ZooKeeper就会通知路由层使路由信息保持最新状态。

执行查询

请求处理查询可分为两种场景,单节点查询和集群查询,前者一般是针对一类数据的查询并且该类数据存储在同一个节点上,后者是同时发给多个节点,最后再做聚合操作。集群查询也称为并行查询,通常⽤于分析的⼤规模并⾏处理(MPP, Massively parallel processing) 关系型数据库产品在
其⽀持的查询类型⽅⾯要复杂得多。⼀个典型的数据仓库查询包含多个连接,过滤,分组和聚合操作。

ES的查询处理流程

ES使用开源的Lucene作为存储引擎,它赋予ES高性能的数据检索能力,但Lucene仅仅是一个单机索引库。ES基于Lucene进行分布式封装,以支持集群管理、分布式查询、聚合分析等功能。

从使用的直观感受看,ES查询分为2个阶段,query和fetch阶段。在query阶段会从所有的shard上读取相关document的docId及相关的排序字段值,并最终在coordinating节点上收集所有的结果数进入一个全局的排序列表后,然后获取根据from+size指定page页的数据,获取这些docId后再构建一个multi-get请求发送相关的shard上从_source里面获取需要加载的数据,最终再返回给client端。

query阶段:

fetch阶段

所有的搜索系统一般都是两阶段查询,第一阶段查询到匹配的DocID,第二阶段再查询DocID对应的完整文档,这种在Elasticsearch中称为query_then_fetch,还有一种是一阶段查询的时候就返回完整Doc,在Elasticsearch中称作query_and_fetch,一般第二种适用于只需要查询一个Shard的请求。由上图可知,ES允许客户联系任何节点,如果该节点恰巧拥有请求的分区,则它可以直接处理该请求;否则,它将请求转发到适当的节点,接收回复然后聚合并传递最终的聚合结果给客户端。

小结

大数据量场景在单台机器上存储和处理不再可⾏,则分区⼗分必要。分区的⽬标是在多台机器上均匀分布数据和查询负载,避免出现热点(负载不成⽐例的节点)。这需要选择适合于您的数据的分区⽅案,并在将节点添加到集群或从集群删除时进⾏再分区。

常见的键值分区方式有按照范围分区、按照键的散列分区两种。请求的处理机制一般有客户端处理、代理处理、服务节点处理3种方式,不管哪种方式,都需要其知道分区-节点之间的映射关系,一般映射关系是保存在配置中心上,比如zookeeper。

 

推荐阅读 

 

Guess you like

Origin www.cnblogs.com/luoxn28/p/12129370.html