Heavy sharing! High-performance architecture learning roadmap-distributed architecture evolution + reference notes

1. Roadmap for distributed architecture learning

According to statistics, a person's reading time is less than 20 minutes to achieve full devotion. The length of the individual articles of Gu will be shortened as much as possible in the future, but the updates will be as frequent as possible.

image

image

2. History of computer software development

First of all, we understand the development history of computer software, roughly summarized, divided into c/s era, web1.0 era and web2.0 era.

c/s era: rich client solutions. Selling software can make money. ​For example, qq, audio and video, games.

1.0 era: It is mainly the one-way information release, that is, the information portal --- the vast browser client. The Internet content is customized by a small number of editors (or webmasters).

The table is the three major portals, Sina/Netease/Sohu. Sina focuses on news + advertising, NetEase focuses on expanding games, and Sohu extends the portal matrix​​

2.0 era: focus on user interaction. Everyone is a contributor of content. RSS subscription plays a very important role. ​​

For example: blogs, podcasts, wikis, P2P downloads, communities, sharing services

image

 

Today, the form of the Internet has evolved into an event for all employees to participate in, suitable for all ages. Therefore, Internet-related technologies have become increasingly demanding, and the increase in the number of participants has made the system more and more burdensome.

Third, the history of technological architecture evolution

The following are the trading indicators of Tmall Double 11 in 2017. With such a large amount of data and so fast processing of requests, it is obvious that a single machine and a single service cannot support it.

image

 

So what should we do? We need to split and deploy the services that were originally deployed and processed by a single server to different servers, so that multiple machines can be used to process and share the pressure. But we have to ensure the integrity of the system. This is the distributed design. Next we look at the evolution history of the service architecture.

Architecture evolution one: early prototype

Features: The application program mainly reads static files and returns the content to the browser.

image

 

Architecture evolution 2: database development (LAMP specialty)

Features: The application program mainly reads data table values ​​and fills html modules. Simple business logic, write sql

image

 

Architecture evolution 3: the prototype of javaweb

Features: tomcat + servlet + jsp + mysql. A war package hits the world​

Project structure: ssh/ssm three-tier structure.

image

 

Architecture evolution 4: javaweb cluster development​

Features: Horizontal replication of hardware machines has no impact on the entire project structure.

image

 

Architecture evolution five: distributed development of javaweb

Features: Separate the Service layer into a separate project jar. Run alone. ​The web server calls the separated service through the rpc framework.

image

 

Architecture evolution 6: javaweb's microservice development​

Features: From a business perspective, the business is subdivided into microservices, and each microservice is a complete service (from http request to return). ​In the microservices, the interface that needs to be provided to the outside is packaged into an rpc interface, which is open to the outside.

image

 

The difference between cluster and distributed

When I was interviewing, I found that many students would confuse clustering and distributed. In fact, they are completely two things.

Distributed: Vertically split, a business is split into multiple sub-businesses and deployed on different servers. It is mainly to split the business level and decouple the business to improve the high availability and high performance of services.
Cluster: Horizontal replication, the same service is deployed on multiple servers, and load balancing is used in the front to share the pressure. And even if one or two of these servers are down, the overall business will not be affected.

image

image

This chapter mainly talks about the learning route of high-performance architecture and the history of technological evolution. Next, let’s talk about the necessary skills for Alibaba's millions of annual salary architects-high-performance architecture learning route (notes): middleware, Nginx, caching, ZK, etc.... See the advanced skills diagram of high-performance architecture below...

image

Note: All of the following high-performance architecture learning routes and related notes for the necessary skills of architects: middleware, Nginx, cache, ZK, etc., the space is limited, many of which are screenshots, but the pictures are very high-definition , You can clearly see the content. And the complete original pdf+xmind file, the editor has also collected it here

 

 

1. Zookeeper distributed environment commander

image.png

 

1.1 zookeeper basics

ZooKeeper is a distributed coordination service used to manage large hosts. Coordinating and managing services in a distributed environment is a complex process. ZooKeeper solves this problem through its simple architecture and API. ZooKeeper allows developers to focus on the core application logic without worrying about the distributed nature of the application.

1.2 Advantages of distributed applications

  • (1) Reliability-The failure of a single or several systems will not cause the entire system to fail.

  • (2) Scalability-performance can be increased when needed, by adding more machines, making minor changes in the application configuration without downtime.

  • (3) Transparency-hide the complexity of the system and display it as a single entity/application.

1.3 Challenges of distributed applications

  • (1) Race condition-Two or more machines try to perform a specific task, in fact, only a single machine can complete it at any given time. For example, shared resources can only be modified by a single machine at any given time.

  • (2) Deadlock-Two or more operations wait for each other to complete indefinitely.

  • (3) Inconsistent-part of the data failed.

1.4 Zookeeper related notes

image.png

 

  • ZK handwritten notes (1): overview + CPA + environment matching + consistency agreement + basic use

image

 

  • ZK handwritten notes (2): source code analysis + application scenarios

image

 

Second, Nginx high concurrency shunt advanced actual combat

image

 

2.1 How does nginx achieve high concurrency

  • Simply put, it is asynchronous, non-blocking, using epoll and a lot of low-level code optimization.

  • In a little more detail, it is the design of nginx's special process model and event model.

2.2 Process model

  • Nginx uses a master process and multiple worker processes.

  • The master process is mainly responsible for collecting and distributing requests. When a request comes, the master pulls up a worker process to handle the request.

  • The master process is also responsible for monitoring the status of workers to ensure high reliability

  • The worker process is generally set to be the same as the number of cpu cores. The worker process of nginx is different from that of apache. The apche process can only handle one request at a time, so it will open many processes, hundreds or even thousands. The worker process of nginx can handle the number of requests at the same time only limited by memory, so it can handle multiple requests.

2.3 Event model

Nginx is asynchronous and non-blocking.

Every request comes in, a worker process will handle it. But it is not the whole process, to what extent? Processing to the place where blocking may occur, such as forwarding the request to the upstream (back-end) server, and waiting for the request to return. Then, the processing worker will not wait so stupidly. After sending the request, he will register an event: "If the upstream returns, tell me, I will continue." So he went to rest. At this point, if another request comes in, he can process it in this way soon. Once the upstream server returns, this event will be triggered, the worker will take over, and the request will go down.

The nature of the work of the web server determines that most of the life of each request is in the network transmission, in fact, not much time is spent on the server machine. This is the secret to solving high concurrency in a few processes.

2.4 Nginx related notes

image.png

 

  • Nginx Common Application Technology Guide [Nginx Tips]

image

 

  • In-depth analysis of Nginx

image

 

image

 

Three, rabbitMQ message middleware

image.png

 

  • (1) Broker: Message middleware instance, which may be a single node or a logical entity running on a multi-node cluster

  • (2) Message: The message consists of two parts: the message header and the message body. The message header includes standard message headers such as routing-key, priority, and other custom message headers, which are used to define RabbitMQ's message behavior. The message body is a byte stream and contains the message content.

  • (3) Connection: TCP connection between the client and Broker

  • (4) Channel (Channel): Channel is a logical (virtual) connection established on a TCP connection. Multiple Channels reuse the same TCP connection to avoid the huge overhead of establishing a TCP connection. RabbitMQ officially requires that each thread use an independent Channel, and multiple threads are prohibited from sharing the Channel.

  • (5) Producer (Publisher): the client thread that sends the message

  • (6) Consumer: the client thread that processes the message

  • (7) Exchange: The exchange is responsible for delivering the message to the corresponding queue

  • (8) Queue: Receive and save the message delivered by the switch until it is successfully consumed by the consumer. The logical structure follows the first-in first-out FIFO.

  • (9) Binding: Register the queue (Queue) to the routing table of the Exchange (Exchange)

  • (10) Virtual host (Vhost): Multiple vhosts can be established under each Broker, and each vhost can establish an independent Exchange, Queue, binding and authority system. The vhosts under the same Broker share Connection, Channel, and user system, which means that you can use the same user identity to use the same Channel to access different vhosts.

3.1 rabbitMQ message middleware related notes

 

  • RabbitMQ-the most complete and complete tutorial

image

 

  • RabbitMQ combat guide

image

 

Four, ActiveMQ message middleware

image

 

  • (1) The client is written in multiple languages ​​and protocols. Languages: Java, C, C++, C#, Ruby, Perl, Python, PHP. Application protocol: OpenWire, Stomp REST, WS Notification, XMPP, AMQP

  • (2) Fully support JMS1.1 and J2EE 1.4 specifications (persistence, XA message, transaction)

  • (3) Support for Spring, ActiveMQ can be easily embedded into the system using Spring, and it also supports the features of Spring 2.0

  • (4) Passed the test of common J2EE servers (such as Geronimo, JBoss 4, GlassFish, WebLogic), and through the configuration of JCA 1.5 resource adaptors, ActiveMQ can be automatically deployed to any J2EE 1.4 compatible commercial server

  • (5) Support multiple transmission protocols: in-VM, TCP, SSL, NIO, UDP, JGroups, JXTA

  • (6) Support high-speed message persistence through JDBC and journal

  • (7) Designed to ensure high-performance clusters, client-server, point-to-point

  • (8) Support Ajax

  • (9) Support integration with Axis

  • (10) It is easy to call the embedded JMS provider for testing

Five, Kafka million-level throughput combat

image

 

Kafka was originally an internal infrastructure system of LinkedIn. The reason for the initial development is that, although LinkedIn has databases and other systems that can be used to store data, it lacks a component that can help handle continuous data flows. Therefore, in terms of design concepts, developers do not want to just develop a system that can store data, such as relational databases, Nosql databases, search engines, etc., but also want to treat data as a continuously changing and growing stream, and based on this idea Build a data system, a data architecture.

Kafka externally behaves like a message system, allowing publishing and subscribing to message streams, but it is very different from traditional message systems.

  • First of all, Kafka is a modern distributed system that runs in a cluster and can scale freely.

  • Second, Kafka can store data as required, for as long as possible.

  • Third, stream processing brings the level of data processing to a new level. The message system only transmits data. Kafka's stream processing capabilities allow us to dynamically process derived streams and data sets with very little code. So Kafka is more than just a message middleware

Kafka is not only a message middleware, but also a streaming platform. On this platform, you can publish and subscribe to data streams (Kafka's stream, there is a separate package Stream processing), and save them for processing, this It is the design concept of the author of Kafka.

5.1 Kafka million-level throughput actual combat related notes

image.png

 

  • Handwritten "Kafka Notes"

image

 

  • Kafka source code analysis and actual combat

image

 

Six, Redis high-performance cache database

image.png

 

6.1 Redis data structure and related common commands

  • Key: Redis adopts the basic data structure of Key-Value type, any binary sequence can be used as Redis Key (for example, a normal string or a JPEG picture)

  • String: String is the basic data type of Redis. Redis does not have the concept of Int, Float, Boolean and other data types. All basic types are represented by String in Redis.

  • SET: Set the value for a key, you can specify the validity period of the key with the EX/PX parameter, and use the NX/XX parameter to distinguish whether the key exists or not. The time complexity is O(1)

  • GET: Get the value corresponding to a key, time complexity is O(1)

  • GETSET: Set the value for a key and return the original value of the key. The time complexity is O(1)

  • MSET: set value for multiple keys, time complexity O(N)

  • MSETNX: Same as MSET, if any of the specified keys already exists, no operation is performed, and the time complexity is O(N)

  • MGET: Get the value corresponding to multiple keys, the time complexity is O(N)

  • INCR: increment the value corresponding to the key by 1, and return the incremented value. It only works on String data that can be converted to an integer. Time complexity O(1)

  • INCRBY: Increment the value corresponding to the key by the specified integer value, and return the incremented value. It only works on String data that can be converted to an integer. Time complexity O(1)

  • DECR/DECRBY: Same as INCR/INCRBY, change from increment to decrement.

6.2 Redis high-performance cache database related notes

image.png

 

  • Redis high-performance cache

image

 

  • Redis combat

image

 

  • Redis design and implementation

image

 

image

 

6. Commonly used technologies and case studies of distributed systems (PDF)

image

 

This PDF is divided into three parts, namely the basic theory of distributed systems, common technologies of distributed systems, and classic case studies of distributed systems.

  • The first part mainly introduces the basic theoretical knowledge of distributed systems, and summarizes some paradigms, knowledge points and possible problems that need to be considered when designing distributed systems, including threads, communication, consistency, fault tolerance, CAP theory, and security And concurrency and other related content; at the same time, it describes the common architecture of distributed systems, including the recently popular RESTful style architecture, microservices, container technology, etc.

  • The second part mainly lists some mainstream technologies frequently used in distributed system applications, and introduces the functions and usage of these technologies; these technologies cover distributed message services, distributed computing, distributed storage, and distributed monitoring systems , Distributed version control, RESTful, microservices, containers and other fields.

  • The third part selects large-scale distributed system cases of well-known Internet companies at home and abroad, represented by Taobao and Twitter, and analyzes its architecture design and evolution process; this part is equivalent to making a "skewer" on the scattered technical points of the second part ", so that readers can combine technical theory and see the effect of actual combat.

image

 

image

 

to sum up

Every programmer friend has his own architect dream, but dreams are often beautiful, and reality is extremely cruel. If you don't work hard or struggle, you may stop at the grassroots for a lifetime. Maybe a lot of friends will say that programmers are also young people, and they can't climb up when they are old, and their brains and bodies can't keep up. That's the case, why not take advantage of yourself while you are still young, seize the opportunity, work hard, and the bright future will have the opportunity to wave with you! Of course, this only represents my personal opinion. After all, if a hundred people have a hundred different minds, there will be thousands of different ideas.

However, just one sentence, if you are still in this line, you are still a programmer (yuan), you who want to go uphill, maybe my Alibaba million annual salary must-have-high-performance architecture learning route may be for you Helped.

image

 

Guess you like

Origin blog.csdn.net/weixin_47082274/article/details/110926810