Interview Highlights: About 10 large site system architecture may not understand your question

1.png


1. Which components you use or ways to improve site performance, availability and concurrency

(1) improve the capacity of the hardware, increase system server . (When the server increases to a certain extent, when the amount of concurrent access system can provide almost the same, so can not fundamentally solve the problem)

(2) use the cache (local cache: Local JDK can use built-in Map, Guava Cache Distributed Cache Redis, Memcache does not apply to a local cache to improve system concurrency, generally used in the program is useful, such as Spring is how... achieve a single example of it? If you read the source code, it should know, Spiring to have been the original variables in a Map, the next time you want to use this variable, there is no first judgment Map, which is the system Example implementations common single mode.)

(3) message queue  (asynchronous decoupling + clipping +)

(4) distributed developed  (different services deployed on different nodes of the machine, and a service can be deployed on multiple machines, and then use load balancing Nginx access. This solves the single point of deployment (All In) of shortcomings, greatly increase the amount of concurrent systems)

(5) sub-library database (separate read and write), sub-table (horizontal sub-table, the vertical sub-table)

(6) the use of cluster  (multiple machines providing the same service)

(7) CDN acceleration  (some static resources such as images, video, etc. to the cache closest to the user's network node)

(8) browser cache

(9) using a suitable connection pool (pool database connections, etc. thread pool)

(10) the appropriate use of multi-threaded development.

2. Design a common means of highly available systems

(1) Downgrade:  Service downgraded when the server is pressure surge, according to the current business situation and traffic downgrade policies of some services and pages, thus freeing server resources to ensure the normal operation of the core tasks. Downgrade tend to specify a different level, face different abnormal levels perform different processing. According to the service: You can reject a service, the service may be delayed, it can sometimes be random service. According to the scope of services: You can cut a feature, you can also cut some modules. In short service degradation requires different strategies depending on the downgrading of business needs. The main purpose is to undermine the service although it is better than nothing;

(2) limiting:  prevent malicious traffic request, *** malicious, or to prevent the peak traffic exceeds the system;

(3) buffer:  Avoid straight into the large number of requests the database, the database defeat;

(4) Timeout and retry mechanism:  avoid accumulation caused by avalanche request;

(5) rollback mechanism:  Quick Fix wrong version.

3. Modern Internet applications typically What are the characteristics?

(1) high concurrency, large flow ;

(2) availability: System 7 × 24 hour service;

(3) Mass data: need to store, manage large volumes of data, requires a large number of servers;

(4) user is widely distributed, network complexity: Many large Internet is providing services for users worldwide, a wide range of user distribution, network conditions vary widely around;

(5) adverse security environment: Due to the open nature of the Internet, making the Internet more vulnerable ***, large sites will be ****** almost every day;

(6) demand for rapid change, publish frequently: and versions of traditional software release different frequencies, Internet products to quickly adapt to the market to meet customer needs, product release its frequency is very high;

(7) the progressive development: traditional enterprise applications software product or a good start planning all the different functional and non-functional requirements, almost all of the major Internet sites are sites from a small start, progressively developed.

4. Talk about your knowledge and understanding of the field of micro-services

Now big companies are using and future trends are Spring Cloud, while Ali is also open source Spring Cloud Alibaba Cloud Spring achieve specification.

We usually Spring Cloud understood as a collection of a series of open source components, but Spring Cloud Spring Cloud Netflix is ​​not equivalent to the Ribbon, Feign, Eureka (stop updating), Hystrix this set of components, but the development of a common set of abstract mode. Its purpose is through this abstract universal model that allows developers to develop faster and better business. But the actual run-time support this development model, or dependent on the RPC, gateways, service discovery, configuration management, current-limiting fuse, the specific implementation of distributed link tracking components.

Spring Cloud Alibaba is the official certification of the new set of norms Spring realization Cloud, Spring Cloud Alibaba is a set of open source products made in China, there will be follow-up of some Chinese reference and the principle of analytical articles, so that the domestic developers is great one thing. Ali's move is bound to promote the development of domestic micro-services technology, because in the absence of Spring Cloud Alibaba, our first choice is Spring Cloud Netflix, but their documents are in English, it is also more difficult to troubleshoot the problem, in the country there is not a particularly large number of people proficient. Spring Cloud Alibaba by the open-source components and Ali Ali cloud product components of two parts, which is committed to providing micro-service one-stop solution, enabling developers to easily by Spring Cloud programming model to easily develop micro-service applications.

In addition, Apache Dubbo Ecosystem services around the micro-ecological Apache Dubbo built, best practice is a combination of micro-services production-proven. Alibaba micro-service solutions, Dubbo, Nacos and Sentinel, and follow the open source micro-service components, are part of Dubbo EcoSystem. Ali will also follow Dubbo EcoSystem integrated ecological Spring Cloud's.

5. Talk about your know (their relationship) of Dubbo and the Spring Cloud

No specific public can see - this article Alibaba middleware: exclusive interpretation: Dubbo Ecosystem - from the micro to the micro-service ecosystem services framework

Dubbo and Spring Cloud is not a competition, Dubbo as mature RPC framework, its ease of use, scalability and robustness has been recognized by the industry. Dubbo future will be as Spring Cloud Alibaba's RPC components, and seamless integration with Spring Cloud native Feign and RestTemplate, to achieve "zero" cost of migration.

Alibaba micro-service solutions, Dubbo, Nacos and Sentinel, and follow the open source micro-service components, are part of Dubbo EcoSystem. We will also follow Dubbo EcoSystem integrated ecological Spring Cloud's.

6. Performance Testing know? Tell me what you know of the performance testing tool?

Performance test means to test the system performance is simulated variety of normal, and the abnormal peak load conditions through automated testing tools. Performance testing is a general term, usually broken down into:

(1) Benchmark:  at lower pressure is applied to the system, the health system and to view the records as a basis for the relevant reference number

(2) load test **: ** refers to the system continue to increase the pressure or increase the duration under some pressure until an item or a number of performance indicators system reaches safety threshold, for example, a resource has reached saturation Wait. At this time, the pressure continues, the system capacity will decrease.

(3)压力测试: 超过安全负载情况下,不断施加压力(增加并发请求),直到系统崩溃或无法处理任何请求,依此获得系统最大压力承受能力。

(4)稳定性测试: 被测试系统在特定硬件、软件、网络环境下,加载一定业务压力(模拟生产环境不同时间点、不均匀请求,呈波浪特性)运行一段较长时间,以此检测系统是否稳定。

后端程序员或者测试平常比较常用的测试工具是 JMeter(官网:jmeter.apache.org/)。Apache JMeter 是一款基于Java的压力测试工具(100%纯Java应用程序),旨在加载测试功能行为和测量性能。它最初被设计用于 Web 应用测试但后来扩展到其他测试领域。

7. 对于一个单体应用系统,随着产品使用的用户越来越多,网站的流量会增加,最终单台服务器无法处理那么大的流量怎么办?

这个时候就要考虑扩容了。《亿级流量网站架构核心技术》这本书上面介绍到我们可以考虑下面几步来解决这个问题:

  • 第一步,可以考虑简单的扩容来解决问题。比如增加系统的服务器,提高硬件能力等等。

  • 第二步,如果简单扩容搞不定,就需要水平拆分和垂直拆分数据/应用来提升系统的伸缩性,即通过扩容提升系统负载能力。

  • 第三步,如果通过水平拆分/垂直拆分还是搞不定,那就需要根据现有系统特性,架构层面进行重构甚至是重新设计,即推倒重来。

对于系统设计,理想的情况下应支持线性扩容和弹性扩容,即在系统瓶颈时,只需要增加机器就可以解决系统瓶颈,如降低延迟提升吞吐量,从而实现扩容需求。

如果你想扩容,则支持水平/垂直伸缩是前提。在进行拆分时,一定要清楚知道自己的目的是什么,拆分后带来的问题如何解决,拆分后如果没有得到任何收益就不要为了 拆而拆,即不要过度拆分,要适合自己的业务。

8. 大表优化的常见手段

当MySQL单表记录数过大时,数据库的CRUD性能会明显下降,一些常见的优化措施如下:

(1)限定数据的范围: 务必禁止不带任何限制数据范围条件的查询语句。比如:我们当用户在查询订单历史的时候,我们可以控制在一个月的范围内。;

(2)读/写分离: 经典的数据库拆分方案,主库负责写,从库负责读;

(3)垂直分区: 根据数据库里面数据表的相关性进行拆分。 例如,用户表中既有用户的登录信息又有用户的基本信息,可以将用户表拆分成两个单独的表,甚至放到单独的库做分库。简单来说垂直拆分是指数据表列的拆分,把一张列比较多的表拆分为多张表。 如下图所示,这样来说大家应该就更容易理解了。

484ca7093fe14516b083509a55708aa9


垂直拆分的优点: 可以使得行数据变小,在查询时减少读取的Block数,减少I/O次数。此外,垂直分区可以简化表的结构,易于维护。垂直拆分的缺点: 主键会出现冗余,需要管理冗余列,并会引起Join操作,可以通过在应用层进行Join来解决。此外,垂直分区会让事务变得更加复杂;

(4)水平分区: 保持数据表结构不变,通过某种策略存储数据分片。这样每一片数据分散到不同的表或者库中,达到了分布式的目的。 水平拆分可以支撑非常大的数据量。 水平拆分是指数据表行的拆分,表的行数超过200万行时,就会变慢,这时可以把一张的表的数据拆成多张表来存放。举个例子:我们可以将用户信息表拆分成多个用户信息表,这样就可以避免单一表数据量过大对性能造成影响。

925c503036c64b4fbafc9f6aeeb21f42


水平拆分可以支持非常大的数据量。需要注意的一点是:分表仅仅是解决了单一表数据过大的问题,但由于表的数据还是在同一台机器上,其实对于提升MySQL并发能力没有什么意义,所以 水平拆分最好分库 。水平拆分能够 支持非常大的数据量存储,应用端改造也少,但 分片事务难以解决 ,跨界点Join性能较差,逻辑复杂。《Java工程师修炼之道》的作者推荐 尽量不要对数据进行分片,因为拆分会带来逻辑、部署、运维的各种复杂度 ,一般的数据表在优化得当的情况下支撑千万以下的数据量是没有太大问题的。如果实在要分片,尽量选择客户端分片架构,这样可以减少一次和中间件的网络I/O。

下面补充一下数据库分片的两种常见方案:

  • 客户端代理: 分片逻辑在应用端,封装在jar包中,通过修改或者封装JDBC层来实现。 当当网的 Sharding-JDBC 、阿里的TDDL是两种比较常用的实现。

  • 中间件代理: 在应用和数据中间加了一个代理层。分片逻辑统一维护在中间件服务中。 我们现在谈的 Mycat 、360的Atlas、网易的DDB等等都是这种架构的实现。

9. 在系统中使用消息队列能带来什么好处?

《大型网站技术架构》第四章和第七章均有提到消息队列对应用性能及扩展性的提升。

(1)通过异步处理提高系统性能

76736a2baa574796b1d9d48bf6232091


如上图,在不使用消息队列服务器的时候,用户的请求数据直接写入数据库,在高并发的情况下数据库压力剧增,使得响应速度变慢。但是在使用消息队列之后,用户的请求数据发送给消息队列之后立即 返回,再由消息队列的消费者进程从消息队列中获取数据,异步写入数据库。由于消息队列服务器处理速度快于数据库(消息队列也比数据库有更好的伸缩性),因此响应速度得到大幅改善。通过以上分析我们可以得出消息队列具有很好的削峰作用的功能——即通过异步处理,将短时间高并发产生的事务消息存储在消息队列中,从而削平高峰期的并发事务。 举例:在电子商务一些秒杀、促销活动中,合理使用消息队列可以有效抵御促销活动刚开始大量订单涌入对系统的冲击。如下图所示:

145b7de1440c49e78a8befff75c5a2b4


因为用户请求数据写入消息队列之后就立即返回给用户了,但是请求数据在后续的业务校验、写数据库等操作中可能失败。因此使用消息队列进行异步处理之后,需要适当修改业务流程进行配合,比如用户在提交订单之后,订单数据写入消息队列,不能立即返回用户订单提交成功,需要在消息队列的订单消费者进程真正处理完该订单之后,甚至出库后,再通过电子邮件或短信通知用户订单成功,以免交易纠纷。这就类似我们平时手机订火车票和电影票。

(2) 降低系统耦合性

我们知道模块分布式部署以后聚合方式通常有两种:分布式消息队列分布式服务

先来简单说一下分布式服务:

目前使用比较多的用来构建SOA(Service Oriented Architecture面向服务体系结构)的分布式服务框架是阿里巴巴开源的Dubbo.

再来谈我们的分布式消息队列:

我们知道如果模块之间不存在直接调用,那么新增模块或者修改模块就对其他模块影响较小,这样系统的可扩展性无疑更好一些。

我们最常见的事件驱动架构类似生产者消费者模式,在大型网站中通常用利用消息队列实现事件驱动结构。如下图所示:

976970cab03045ae8a944cb63fbfc40f


消息队列使利用发布-订阅模式工作,消息发送者(生产者)发布消息,一个或多个消息接受者(消费者)订阅消息。 从上图可以看到消息发送者(生产者)和消息接受者(消费者)之间没有直接耦合,消息发送者将消息发送至分布式消息队列即结束对消息的处理,消息接受者从分布式消息队列获取该消息后进行后续处理,并不需要知道该消息从何而来。对新增业务,只要对该类消息感兴趣,即可订阅该消息,对原有系统和业务没有任何影响,从而实现网站业务的可扩展性设计。消息接受者对消息进行过滤、处理、包装后,构造成一个新的消息类型,将消息继续发送出去,等待其他消息接受者订阅该消息。因此基于事件(消息对象)驱动的业务架构可以是一系列流程。

另外为了避免消息队列服务器宕机造成消息丢失,会将成功发送到消息队列的消息存储在消息生产者服务器上,等消息真正被消费者服务器处理后才删除消息。在消息队列服务器宕机后,生产者服务器会选择分布式消息队列服务器集群中的其他服务器发布消息。

备注: 不要认为消息队列只能利用发布-订阅模式工作,只不过在解耦这个特定业务环境下是使用发布-订阅模式的,比如在我们的ActiveMQ消息队列中还有点对点工作模式

10. 说说自己对 CAP 定理,BASE 理论的了解

(1)CAP 定理

29c9e2d706e34b299214f1267e3de713


在理论计算机科学中,CAP定理(CAP theorem),又被称作布鲁尔定理(Brewer's theorem),它指出对于一个分布式计算系统来说,不可能同时满足以下三点:

  • 一致性(Consistence) :所有节点访问同一份最新的数据副本

  • 可用性(Availability):每次请求都能获取到非错的响应——但是不保证获取的数据为最新数据

  • 分区容错性(Partition tolerance) : 分布式系统在遇到某节点或网络分区故障的时候,仍然能够对外提供满足一致性和可用性的服务。

CAP仅适用于原子读写的NOSQL场景中,并不适合数据库系统。现在的分布式系统具有更多特性比如扩展性、可用性等等,在进行系统设计和开发时,我们不应该仅仅局限在CAP问题上。

注意:不是所谓的3选2(不要被网上大多数文章误导了):

大部分人解释这一定律时,常常简单的表述为:“一致性、可用性、分区容忍性三者你只能同时达到其中两个,不可能同时达到”。实际上这是一个非常具有误导性质的说法,而且在CAP理论诞生12年之后,CAP之父也在2012年重写了之前的论文。

当发生网络分区的时候,如果我们要继续服务,那么强一致性和可用性只能2选1。也就是说当网络分区之后P是前提,决定了P之后才有C和A的选择。也就是说分区容错性(Partition tolerance)我们是必须要实现的。

(2)BASE 理论

BASE 是 Basically Available(基本可用) 、Soft-state(软状态) 和 Eventually Consistent(最终一致性) 三个短语的缩写。BASE理论是对CAP中一致性和可用性权衡的结果,其来源于对大规模互联网系统分布式实践的总结,是基于CAP定理逐步演化而来的,它大大降低了我们对系统的要求。

BASE理论的核心思想: 即使无法做到强一致性,但每个应用都可以根据自身业务特点,采用适当的方式来使系统达到最终一致性。也就是牺牲数据的一致性来满足系统的高可用性,系统中一部分数据不可用或者不一致时,仍需要保持系统整体“主要可用”。

BASE理论三要素:

7836e1ea19464062b871cb5239695cb9


基本可用: 基本可用是指分布式系统在出现不可预知故障的时候,允许损失部分可用性。但是,这绝不等价于系统不可用。 比如:①响应时间上的损失:正常情况下,一个在线搜索引擎需要在0.5秒之内返回给用户相应的查询结果,但由于出现故障,查询结果的响应时间增加了1~2秒;②系统功能上的损失:正常情况下,在一个电子商务网站上进行购物的时候,消费者几乎能够顺利完成每一笔订单,但是在一些节日大促购物高峰的时候,由于消费者的购物行为激增,为了保护购物系统的稳定性,部分消费者可能会被引导到一个降级页面;

软状态: 软状态指允许系统中的数据存在中间状态,并认为该中间状态的存在不会影响系统的整体可用性,即允许系统在不同节点的数据副本之间进行数据同步的过程存在延时;

Eventual consistency:  eventual consistency emphasized that all copies of the data system, after synchronization over time, and ultimately to achieve a consistent state. Therefore, the nature of eventual consistency is the need to ensure that the final system to achieve data consistency, without the need for real-time systems guarantee strong consistency of the data.


Guess you like

Origin blog.51cto.com/14378044/2415657