Ali game high availability architecture design practice

 

 

    Today, Li Yunhua teacher read the book, "Ali game high availability architecture design practice ', would like to share some feelings.

    The sentence was impressed that he initially said, "the flavor of the pot to make R & D back!" In other words, highly available system is designed, not by the operation and maintenance guarantee out! He mentioned problems with the way people think in order: first thought is not too LOW operation and maintenance, and hardware such as poor quality, why this month the cabinet also bad, too bad switch, is not to buy a second-hand computer city put inside? The second thought is not bad luck, a month ago, two months before he had once met four times this month, is not you do not burn incense in the room? The third test is not a lack why these Bug testing phase can not be found, only to find online? There are inadequate operation and maintenance experience, such as a switch failure, some people think that is very simple, switching on the line. Some students even mentioned process is imperfect, to say the whole process, there are many areas for improvement. For example, after the failure, the response mechanism is not smooth enough, after the failure bunch of people, including R & D, testing, operation and maintenance rush, is not to be given a treatment program the whole process, designated responsible person pile? However, the main problem is the problem of the system design. There are several solutions on the following ways: high availability goals - the traditional method: After determining the direction we need to set a goal, first determine a target. High Availability are actually a few words refer to a 9,5 9 is probably the carrier-class or financial level, most of the Internet is 3 9-4 9.  But there is a drawback, in addition to technical staff, other students are not well understood, they can not be converted into an intuitive understanding of the four or five 9 9. So, when we were given no such project objectives to be. Availability target - business-oriented: our ultimate goal of determining the target with a few 9 has a relatively large difference, the goal of some 9 mainly from the perspective of the system to consider, that the reliability of this system is that several 9. The advantage of this objective: 1, focused business. 2, readily decomposed. Target itself is our work direction, we must first locate the problem, how to locate the problem? We can think of a way, followed by recovery business, and the third is the frequency of failures can not be too high; 3, easy to measure. Later, when we plan to do, a lot of programs just take this set of standards, basically will be able to judge the proposal is feasible.Finally the whole target down conversion, corresponding to about almost 9 4, 9 4 higher than a little. High Availability overall architecture of the overall architecture of a total of four layers: user layer, network layer, service layer and layer operation and maintenance. In fact, the whole structure with the goal is the same, we are for the entire business, did not say which system should have a few high availability 9, but from the business point of view throughout the whole process Suppose you want to achieve goals, how each should go do. Each layer needs to be done to deal with some of the programs in order to achieve our goals. Then I'll tell you about in detail, the basic ideas and practices of each program. 

        The next step is decoupled architecture: business separation Below is the original architecture of this system all features are included, such as login, registration, issued parameter, messages, log, update. In fact, for the players play the game, the real strong correlation only issued registration and login parameters, and log messages, update is not really a player to play the game or you must have strong correlation. So, business practice is to separate the core business and spin off non-core business to a different system, the call through the interface between the two systems, visit each other. The benefit of this, assuming that non-core business system failure, it does not affect the core business system, because through the interface between them is called, does not share the same resources.

 Service Center service center similar to the DNS, is achieved between service calls when the entire scheduling functions within the system, the service center is the name of a system similar services. Business downgrade to split the system into core and non-core business systems business systems, in some emergency situations, such as non-core business system reboot there is no way, even hung out to say a database, which in turn affect the core business system. This time, the interface is accessible, but the response time is particularly slow, the core of the system is a bit slow. So, in this extreme case, we can send via artificial way downgrade instructions, the function of this non-core business systems to be stopped, this is not stopped the program stopped, but said to them an interface or url stopped, when the core of the system to get access to a 500 or 503 error.

Service Center
服务中心类似于DNS,是实现整个内部系统之间服务调用时候的调度功能,服务中心是一个类似于服务的名字系统。 业务降级
整个系统拆分成核心业务系统和非核心业务系统,在一些紧急情况下,比如说非核心业务系统重启也没有办法,甚至说某个数据库搞挂了,它又影响业务核心系统。 这个时候,接口是可以访问的,但是响应时间特别慢,核心系统就有点被拖慢。 那么,在这种比较极端的情况下,我们可以通过人工的方式下发降级指令,把这个非核心业务系统的功能给停掉,这个停掉并不是把程序停掉,而是说把其中的一个接口或者url停掉,核心系统去访问的时候就得到一个500或者503错误。
360度监控:立体化、自动化以及可视化。

 

总结:研发、测试、运维,大家一起来设计高可用性。 

 

Guess you like

Origin www.cnblogs.com/qilin20/p/11041869.html