Understand load balancing in three minutes

What is load balancing

Load Balance (Load Balance) is one of the factors that must be considered in the design of distributed system architecture. It usually refers to the [even] distribution of requests/data to multiple operation units for execution. The key to load balance is [even].

 

Common load balancing solutions


The common Internet distributed architecture is as above, which is divided into client layer, reverse proxy nginx layer, site layer, service layer, and data layer . It can be seen that each downstream has multiple upstream calls. As long as each upstream accesses each downstream evenly, it is possible to "evenly distribute requests/data to multiple operation units for execution".

 

[Client layer -> reverse proxy layer] load balancing


The load balancing from [client layer] to [reverse proxy layer] is achieved through "DNS polling": DNS-server is configured with multiple resolution IPs for a domain name, and each DNS resolution request accesses DNS-server, These IPs will be polled and returned to ensure that the resolution probability of each IP is the same. These IPs are the external network IPs of nginx, so that the request distribution of each nginx is also balanced.

 

[Reverse proxy layer -> site layer] load balancing


The load balancing from [reverse proxy layer] to [site layer] is achieved through "nginx". By modifying nginx.conf, various load balancing strategies can be implemented:

1) Request polling : Similar to DNS polling, requests are routed to each web-server in turn

2) Least connection routing : which web-server has fewer connections and which web-server is routed to

3)ip哈希:按照访问用户的ip哈希值来路由web-server,只要用户的ip分布是均匀的,请求理论上也是均匀的,ip哈希均衡方法可以做到,同一个用户的请求固定落到同一台web-server上,此策略适合有状态服务,例如session(58沈剑备注:可以这么做,但强烈不建议这么做,站点层无状态是分布式架构设计的基本原则之一,session最好放到数据层存储)

4)…

 

【站点层->服务层】的负载均衡


【站点层】到【服务层】的负载均衡,是通过“服务连接池”实现的。

上游连接池会建立与下游服务多个连接,每次请求会“随机”选取连接来访问下游服务。

上一篇文章RPC-client实现细节》中有详细的负载均衡、故障转移、超时处理的细节描述,欢迎点击link查阅,此处不再展开。

 

【数据层】的负载均衡

在数据量很大的情况下,由于数据层(db,cache)涉及数据的水平切分,所以数据层的负载均衡更为复杂一些,它分为“数据的均衡”,与“请求的均衡”。

数据的均衡是指:水平切分后的每个服务(db,cache),数据量是差不多的。

请求的均衡是指:水平切分后的每个服务(db,cache),请求量是差不多的。

业内常见的水平切分方式有这么几种:

一、按照range水平切分


每一个数据服务,存储一定范围的数据,上图为例:

user0服务,存储uid范围1-1kw

user1服务,存储uid范围1kw-2kw

这个方案的好处是:

(1)规则简单,service只需判断一下uid范围就能路由到对应的存储服务

(2)数据均衡性较好

(3)比较容易扩展,可以随时加一个uid[2kw,3kw]的数据服务

不足是:

(1)请求的负载不一定均衡,一般来说,新注册的用户会比老用户更活跃,大range的服务请求压力会更大

 

二、按照id哈希水平切分


每一个数据服务,存储某个key值hash后的部分数据,上图为例:

user0服务,存储偶数uid数据

user1服务,存储奇数uid数据

这个方案的好处是:

(1)规则简单,service只需对uid进行hash能路由到对应的存储服务

(2)数据均衡性较好

(3)请求均匀性较好

不足是:

(1)不容易扩展,扩展一个数据服务,hash方法改变时候,可能需要进行数据迁移

 

总结

负载均衡(Load Balance)是分布式系统架构设计中必须考虑的因素之一,它通常是指,将请求/数据【均匀】分摊到多个操作单元上执行,负载均衡的关键在于【均匀】。

(1)【客户端层】到【反向代理层】的负载均衡,是通过“DNS轮询”实现的

(2)【反向代理层】到【站点层】的负载均衡,是通过“nginx”实现的

(3)【站点层】到【服务层】的负载均衡,是通过“服务连接池”实现的

(4)【数据层】的负载均衡,要考虑“数据的均衡”与“请求的均衡”两个点,常见的方式有“按照范围水平切分”与“hash水平切分”

作者:58神剑

【完】

欢迎加入“架构师微信”群,与更多专家、技术同行进行热点、难点技术交流。请添加微信:geekmainland或者扫描以下二维码加群主微信,申请入群,务必注明「姓名+公司+职位」。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326427286&siteId=291194637