MySQL sharding devops challenge

Before, we have discussed the application and design challenges of MySQL sharding, as well as some corresponding business challenges that may cause and affect the flexibility of your business. But what are the challenges facing MySQL?

For reference, here is a quick pr & eacute MySQL sharding: MySQL sharding is a strategy that divides the workload of a MySQL application into multiple different MySQL database servers, allowing queries and data CRUD operations to be dispersed. This works around MySQL's single-write host architecture and provides the ability to scale out write and read, although there are trade-offs. This is a big DevOps project.

Now that we have seen the challenge of MySQL's support of business rules, let's take a look at the challenge of MySQL.

Once the split key is selected, the data needs to be physically distributed in the MySQL server array. Each of these servers requires its own database for its own set of data partitions. The initial data distribution can be manual; this is more like a "one-time setup." But what happens as your workload increases? Ideally, each shard grows equally, but sometimes life will happen.

MySQL sharded arrays face two major continuous data maintenance capacity challenges:

  1. Debris grows.
  2. Fragmentation hot spots.

Fragmentation growth means that one or more of your fragments will exceed the storage capacity of the underlying server. Hotspot means that one or more of your fragmented servers are experiencing contention, for example, the central processing unit or network traffic, even if it is not near the storage capacity. Fragmentation growth and hot spots will cause server performance degradation, and both have similar solutions: split the local fragmentation, and move the data (for example, half () to a different MySQL server. This is an important task for DevOps.

From a business perspective, fragmentation growth is positive; growth is good. However, this growth represents some DevOps challenges because it means that further data distribution is necessary. In short, each MySQL server needs enough "free space" to grow, otherwise the transaction will start to slow down or even fail. The best practice is to have at least 40% of available storage, and the average CPU utilization rate is between 60% and 70%. Once the storage exceeds 90%, this will have an impact on the performance of MySQL. Either the servers need to upgrade their disks and/or storage area networks, or the local shards need to be split, and the new shards "half" are moved to different MySQL servers. Ideally, move the shards to the new server to maximize the growth potential, but sometimes the capital expenditure budget requires consolidation. In the case of such "merged" fragments, it is very important to ensure that the new split fragment does not cause contention on its new shared server, otherwise you will have to move it again. This can easily cause some major MySQL sharding DevOps challenges.

Fragmentation hot spots mean access/data asymmetry. This is temporary at best and will resolve itself over time. In the worst case, this means that MySQL splits the key (the strategy (there is a problem, and may need to be reconsidered. This will require redistribution of the entire data workload, which means a lot of downtime or a lot of redundant hardware expenditure. Fragmentation growth is exhausted) Local storage, and fragmentation hotspots cause network, CPU, and/or potential storage contention. Hotspots are not a problem of storage capacity; there may be a lot of disk space left. However, the database usage pattern is resident on the local server Data-driven, so the most straightforward way to deal with this hot spot is to further divide the local fragmentation, which means that MySQL fragmentation brings more DevOps challenges

Managing fragmentation can be tricky. The simplest solution is to let DevOps take the fragment offline, split it, move the new half fragment to the new server, update the fragment: the server maps the LUT, and then brings the offline fragment back online. This will cause all transactional applications (normal (failure) on offline fragments. From a business perspective, this means taking a subset of customers, features, or functions offline. For example, some large game companies perform regular maintenance, and all modified Customers on shards (thousands of people (offline for hours.)

But for high-capacity/high-value MySQL applications that require high availability, since each shard is redundant (for example, each shard has a slave for high availability), the shard change can be done on the slave , And production is not affected. Once the fragments are divided and moved, then the copy needs to catch up with the location of the slave to the master. Ready, promote the slave, demote the master, and then handover. This trades minimal downtime for more development work.

Once the shards are split, and half of the workload is moved to a different (ideally new) node with the new shards, the workload of the original local server that now contains half of the data should drop by 50%. If not, then The fragmentation process may need to be repeated. If the newly created fragments are moved (shared (to another on-site server, such as g.), due to budget constraints. In this case, you need to carefully check the new usage mode of the server, otherwise it may be There is a situation of "dismantling the east wall to make up the west wall".

Fragmentation consolidation may be useful on a regular basis. What happens if the business offering changes? Or is there peak/seasonality in user access patterns, such as Black Friday, Golden Day, or Singles Day? The MySQL sharded array can handle 3 to 5 times as many visitors at the peak of the season, but it is seriously overloaded for the rest of the year. But it turns out that because shard merging requires a similar workload as shard splitting, many enterprises' MySQL sharding deployment is not troublesome. In this case, any subsequent sharding will be transferred to the shared MySQL server currently in use instead of deploying a new server.

The other side of the "data maintenance" coin is infrastructure. Whether it is shard growth, hotspots, splits, or sharding: server mapping, all of these require DevOps to deploy, upgrade, maintain, backup, and eliminate/replace servers. Some of these tasks can be made easier on the cloud, especially the speed of deployment, but they still need to be managed and will still pose challenges for split MySQL applications and many split DevOps challenges.

Specifically, split MySQL applications have three main infrastructure challenges:

  1. Server logistics.
  2. Server backup.
  3. High availability.

MySQL分片阵列的服务器后勤分为三个粗略的组:MySQL服务器本身(对高可用性而言可能是冗余的),分片:服务器映射以及阵列中所需的任何复制策略,例如g.,以避免跨节点事务。 MySQL服务器很简单;购买(或租赁(具有最佳价格/性能的实例大小。 但是随着分片阵列变得越来越大,节点开始处于不同的生命阶段。 某种周期性的背景心跳和/或烟雾测试对于确定哪些服务器的性能落后,以及哪些服务器应该被替换是有用的。 由于高可用性,这一切都翻倍了。 在MySQL中,高可用性通常是通过一个从实例来实现的,它是主实例的完整拷贝。 因此,这是所需MySQL服务器数量的两倍,包括购买/租赁(资本支出(和管理(OPEX). 简而言之DevOps还有很多工作要做。

分片:服务器映射对于MySQL分片数组来说至关重要。 应用程序必须始终知道哪个MySQL服务器包含每个事务的数据。 通常这种映射是用Redis或Memcache完成的。e.快速内存键值存储,提供主键、分片键、数据库名、服务器id、服务器知识产权等之间的LUT(查找表(交集。 这允许应用程序在对运行中的事务影响最小的情况下进行动态查找。 这还需要部署和维护额外的LUT/地图服务器。 他们真的应该是多余的;如果没有这些数据,分片阵列就会瘫痪。

对于本地碎片上需要的任何数据,跨节点复制都是必要的。 这允许本地碎片上的事务避免跨节点事务,以及提供引用完整性和酸保证所需的重大应用程序修改。 如果需要跨节点复制,这将为DevOps增加一层全新的复杂性,确保在每个节点上创建从进程,在必要的主节点上设置binlogs,并监控/确保复制延迟在合理的范围内。

由于没有单一的关系数据库管理系统来管理阵列中的所有MySQL服务器,因此无法通过编程方式获得一致的备份。 可以使用MVCC启动备份,确保每台服务器的事务状态一致,但即使使用NTP(甚至是本地原子钟),本地服务器时间也不会与其他服务器完全匹配。 如果每台服务器完全独立运行,这不是什么大问题;如果业务规则严格避免跨节点事务,那么”足够接近”就足够了。 然而,在与许多公司的DevOps主管交谈后,他们都倾向于对他们的备份策略不满意。 选项包括使用sqlserver数据库同步 数据块存储卷和在一个时间点同时进行数据块拷贝。 一些云提供商在其托管产品中也有快照备份选项,但阵列中每个节点的同步快照仍需要同步。 同样,从备份中恢复节点可能很棘手,需要使用复制来前滚以匹配其他节点的事务状态。 最后,再次同步所有MySQL碎片有时需要在碎片之间滚动重启。 谈论MySQL分割DevOps的挑战!

所有这些复杂性导致一些部署更加注重高可用性,而不是确保一致的备份。

最后,让我们讨论一下与高可用性相关的MySQL分片挑战。 一般来说,MySQL应用程序需要分片所提供的规模,提供需要高度可用的高容量/高价值事务。

这意味着DevOps需要确保每台MySQL服务器都是完全冗余的。每个”碎片”实际上至少由两台服务器组成,或者是主/从配置,或者是主/主配置。 主/从是最容易设置的,但不能保证事务一致性。 主设备/主设备,特别是基于认证复制的,保证了在主设备/主设备之前,辅助设备/主设备拥有事务信息的副本。 这保证了如果主服务器在它之后但在次服务器之前发生中断,次服务器仍然可以完成该事务并接受主服务器发送回应用程序的确认。 不出所料,这种级别的事务一致性高可用性比常规的主/从异步复制更需要设置;用OPEX换取更高水平的高可用性。 简而言之,创建多个额外的MySQL分片DevOps挑战。

自动气象站无线电数据系统每隔五分钟自动提供一次快照备份。 乍一看,这似乎比为高可用性部署冗余服务器要简单得多,更不用说设置和维护所有需要的复制(例如,在不同的关系数据系统主机之间,不同于关系数据系统中已经提供的读取副本).

然而,五分钟的延迟意味着五分钟的事务丢失。 这不同于丢失”飞行中”的交易;如果服务器出现故障,”运行中”的事务通常会在MySQL中丢失。 由于这些事务没有完成,数据更新没有提交,也没有向应用程序发送确认。 但是,如果交易完成,例如,订单被接受,并且客户已经收到订单确认号,那么客户有合理的期望,如果随后服务器停机,交易将持续。 如果客户的付款方式减少了,他们可以根据法律管辖权诉诸法律。 但是如果没有高可用性,电子商务提供商将没有交易记录,即使信用卡公司有。

这是一种制造负面新闻、污点品牌等的无用之举。,然后又回到了DevOps和它的MySQL分片。

还有一个问题,如果您在云部署中没有租用保留的实例,这意味着当新实例上线时,五分钟的备份延迟会显著延长。

虽然分割MySQL肯定是一个公认的解决MySQL应用程序可伸缩性需求的策略,但它需要睁大眼睛来处理。 架构和物流规划都需要创建和审查,以帮助避免您可能给您的开放带来的MySQL分割的DevOps挑战十、资本支出预算。 分割MySQL总是很困难的,它的权衡是多种多样的,尤其是当你的开发团队继续维护分割数组时的连锁效应。 很容易提前做出决定,DevOps最终将付出努力和资源的代价,导致你的MySQL应用程序可能失去可信度。

Guess you like

Origin blog.csdn.net/weixin_49470452/article/details/107506175