1. Roadmap for distributed architecture learning
According to statistics, a person's reading time is less than 20 minutes to achieve full devotion. The length of the individual articles of Gu will be shortened as much as possible in the future, but the updates will be as frequent as possible.
image
2. History of computer software development
First of all, we understand the development history of computer software, roughly summarized, divided into c/s era, web1.0 era and web2.0 era.
c/s era: rich client solutions. Selling software can make money. For example, qq, audio and video, games.
1.0 era: It is mainly the one-way information release, that is, the information portal --- the vast browser client. The Internet content is customized by a small number of editors (or webmasters).
The table is the three major portals, Sina/Netease/Sohu. Sina focuses on news + advertising, NetEase focuses on expanding games, and Sohu extends the portal matrix
2.0 era: focus on user interaction. Everyone is a contributor of content. RSS subscription plays a very important role.
For example: blogs, podcasts, wikis, P2P downloads, communities, sharing services
Today, the form of the Internet has evolved into an event for all employees to participate in, suitable for all ages. Therefore, Internet-related technologies have become increasingly demanding, and the increase in the number of participants has made the system more and more burdensome.
Third, the history of technological architecture evolution
The following are the trading indicators of Tmall Double 11 in 2017. With such a large amount of data and so fast processing of requests, it is obvious that a single machine and a single service cannot support it.
So what should we do? We need to split and deploy the services that were originally deployed and processed by a single server to different servers, so that multiple machines can be used to process and share the pressure. But we have to ensure the integrity of the system. This is the distributed design. Next we look at the evolution history of the service architecture.
Architecture evolution one: early prototype
Features: The application program mainly reads static files and returns the content to the browser.
Architecture evolution 2: database development (LAMP specialty)
Features: The application program mainly reads data table values and fills html modules. Simple business logic, write sql
Architecture evolution 3: the prototype of javaweb
Features: tomcat + servlet + jsp + mysql. A war package hits the world
Project structure: ssh/ssm three-tier structure.
Architecture evolution 4: javaweb cluster development
Features: Horizontal replication of hardware machines has no impact on the entire project structure.
Architecture evolution five: distributed development of javaweb
Features: Separate the Service layer into a separate project jar. Run alone. The web server calls the separated service through the rpc framework.
Architecture evolution 6: javaweb's microservice development
Features: From a business perspective, the business is subdivided into microservices, and each microservice is a complete service (from http request to return). In the microservices, the interface that needs to be provided to the outside is packaged into an rpc interface, which is open to the outside.
The difference between cluster and distributed
When I was interviewing, I found that many students would confuse clustering and distributed. In fact, they are completely two things.
Distributed: Vertically split, a business is split into multiple sub-businesses and deployed on different servers. It is mainly to split the business level and decouple the business to improve the high availability and high performance of services.
Cluster: Horizontal replication, the same service is deployed on multiple servers, and load balancing is used in the front to share the pressure. And even if one or two of these servers are down, the overall business will not be affected.
This chapter mainly talks about the learning route of high-performance architecture and the history of technological evolution. Next, let’s talk about the necessary skills for Alibaba's millions of annual salary architects-high-performance architecture learning route (notes): middleware, Nginx, caching, ZK, etc.... See the advanced skills diagram of high-performance architecture below...
Note: All of the following high-performance architecture learning routes and related notes for the necessary skills of architects: middleware, Nginx, cache, ZK, etc., the space is limited, many of which are screenshots, but the pictures are very high-definition , You can clearly see the content. And the complete original pdf+xmind file, the editor has also collected it here
1. Zookeeper distributed environment commander
1.1 zookeeper basics
ZooKeeper is a distributed coordination service used to manage large hosts. Coordinating and managing services in a distributed environment is a complex process. ZooKeeper solves this problem through its simple architecture and API. ZooKeeper allows developers to focus on the core application logic without worrying about the distributed nature of the application.
1.2 Advantages of distributed applications
-
(1) Reliability-The failure of a single or several systems will not cause the entire system to fail.
-
(2) Scalability-performance can be increased when needed, by adding more machines, making minor changes in the application configuration without downtime.
-
(3) Transparency-hide the complexity of the system and display it as a single entity/application.
1.3 Challenges of distributed applications
-
(1) Race condition-Two or more machines try to perform a specific task, in fact, only a single machine can complete it at any given time. For example, shared resources can only be modified by a single machine at any given time.
-
(2) Deadlock-Two or more operations wait for each other to complete indefinitely.
-
(3) Inconsistent-part of the data failed.
1.4 Zookeeper related notes
- ZK handwritten notes (1): overview + CPA + environment matching + consistency agreement + basic use
- ZK handwritten notes (2): source code analysis + application scenarios
Second, Nginx high concurrency shunt advanced actual combat
2.1 How does nginx achieve high concurrency
-
Simply put, it is asynchronous, non-blocking, using epoll and a lot of low-level code optimization.
-
In a little more detail, it is the design of nginx's special process model and event model.
2.2 Process model
-
Nginx uses a master process and multiple worker processes.
-
The master process is mainly responsible for collecting and distributing requests. When a request comes, the master pulls up a worker process to handle the request.
-
The master process is also responsible for monitoring the status of workers to ensure high reliability
-
The worker process is generally set to be the same as the number of cpu cores. The worker process of nginx is different from that of apache. The apche process can only handle one request at a time, so it will open many processes, hundreds or even thousands. The worker process of nginx can handle the number of requests at the same time only limited by memory, so it can handle multiple requests.
2.3 Event model
Nginx is asynchronous and non-blocking.
Every request comes in, a worker process will handle it. But it is not the whole process, to what extent? Processing to the place where blocking may occur, such as forwarding the request to the upstream (back-end) server, and waiting for the request to return. Then, the processing worker will not wait so stupidly. After sending the request, he will register an event: "If the upstream returns, tell me, I will continue." So he went to rest. At this point, if another request comes in, he can process it in this way soon. Once the upstream server returns, this event will be triggered, the worker will take over, and the request will go down.
The nature of the work of the web server determines that most of the life of each request is in the network transmission, in fact, not much time is spent on the server machine. This is the secret to solving high concurrency in a few processes.
2.4 Nginx related notes
- Nginx Common Application Technology Guide [Nginx Tips]
- In-depth analysis of Nginx
Three, rabbitMQ message middleware
-
(1) Broker: Message middleware instance, which may be a single node or a logical entity running on a multi-node cluster
-
(2) Message: The message consists of two parts: the message header and the message body. The message header includes standard message headers such as routing-key, priority, and other custom message headers, which are used to define RabbitMQ's message behavior. The message body is a byte stream and contains the message content.
-
(3) Connection: TCP connection between the client and Broker
-
(4) Channel (Channel): Channel is a logical (virtual) connection established on a TCP connection. Multiple Channels reuse the same TCP connection to avoid the huge overhead of establishing a TCP connection. RabbitMQ officially requires that each thread use an independent Channel, and multiple threads are prohibited from sharing the Channel.
-
(5) Producer (Publisher): the client thread that sends the message
-
(6) Consumer: the client thread that processes the message
-
(7) Exchange: The exchange is responsible for delivering the message to the corresponding queue
-
(8) Queue: Receive and save the message delivered by the switch until it is successfully consumed by the consumer. The logical structure follows the first-in first-out FIFO.
-
(9) Binding: Register the queue (Queue) to the routing table of the Exchange (Exchange)
-
(10) Virtual host (Vhost): Multiple vhosts can be established under each Broker, and each vhost can establish an independent Exchange, Queue, binding and authority system. The vhosts under the same Broker share Connection, Channel, and user system, which means that you can use the same user identity to use the same Channel to access different vhosts.
3.1 rabbitMQ message middleware related notes
- RabbitMQ-the most complete and complete tutorial
- RabbitMQ combat guide
Four, ActiveMQ message middleware
-
(1) The client is written in multiple languages and protocols. Languages: Java, C, C++, C#, Ruby, Perl, Python, PHP. Application protocol: OpenWire, Stomp REST, WS Notification, XMPP, AMQP
-
(2) Fully support JMS1.1 and J2EE 1.4 specifications (persistence, XA message, transaction)
-
(3) Support for Spring, ActiveMQ can be easily embedded into the system using Spring, and it also supports the features of Spring 2.0
-
(4) Passed the test of common J2EE servers (such as Geronimo, JBoss 4, GlassFish, WebLogic), and through the configuration of JCA 1.5 resource adaptors, ActiveMQ can be automatically deployed to any J2EE 1.4 compatible commercial server
-
(5) Support multiple transmission protocols: in-VM, TCP, SSL, NIO, UDP, JGroups, JXTA
-
(6) Support high-speed message persistence through JDBC and journal
-
(7) Designed to ensure high-performance clusters, client-server, point-to-point
-
(8) Support Ajax
-
(9) Support integration with Axis
-
(10) It is easy to call the embedded JMS provider for testing
Five, Kafka million-level throughput combat
Kafka was originally an internal infrastructure system of LinkedIn. The reason for the initial development is that, although LinkedIn has databases and other systems that can be used to store data, it lacks a component that can help handle continuous data flows. Therefore, in terms of design concepts, developers do not want to just develop a system that can store data, such as relational databases, Nosql databases, search engines, etc., but also want to treat data as a continuously changing and growing stream, and based on this idea Build a data system, a data architecture.
Kafka externally behaves like a message system, allowing publishing and subscribing to message streams, but it is very different from traditional message systems.
-
First of all, Kafka is a modern distributed system that runs in a cluster and can scale freely.
-
Second, Kafka can store data as required, for as long as possible.
-
Third, stream processing brings the level of data processing to a new level. The message system only transmits data. Kafka's stream processing capabilities allow us to dynamically process derived streams and data sets with very little code. So Kafka is more than just a message middleware
Kafka is not only a message middleware, but also a streaming platform. On this platform, you can publish and subscribe to data streams (Kafka's stream, there is a separate package Stream processing), and save them for processing, this It is the design concept of the author of Kafka.
5.1 Kafka million-level throughput actual combat related notes
- Handwritten "Kafka Notes"
- Kafka source code analysis and actual combat
Six, Redis high-performance cache database
6.1 Redis data structure and related common commands
-
Key: Redis adopts the basic data structure of Key-Value type, any binary sequence can be used as Redis Key (for example, a normal string or a JPEG picture)
-
String: String is the basic data type of Redis. Redis does not have the concept of Int, Float, Boolean and other data types. All basic types are represented by String in Redis.
-
SET: Set the value for a key, you can specify the validity period of the key with the EX/PX parameter, and use the NX/XX parameter to distinguish whether the key exists or not. The time complexity is O(1)
-
GET: Get the value corresponding to a key, time complexity is O(1)
-
GETSET: Set the value for a key and return the original value of the key. The time complexity is O(1)
-
MSET: set value for multiple keys, time complexity O(N)
-
MSETNX: Same as MSET, if any of the specified keys already exists, no operation is performed, and the time complexity is O(N)
-
MGET: Get the value corresponding to multiple keys, the time complexity is O(N)
-
INCR: increment the value corresponding to the key by 1, and return the incremented value. It only works on String data that can be converted to an integer. Time complexity O(1)
-
INCRBY: Increment the value corresponding to the key by the specified integer value, and return the incremented value. It only works on String data that can be converted to an integer. Time complexity O(1)
-
DECR/DECRBY: Same as INCR/INCRBY, change from increment to decrement.
6.2 Redis high-performance cache database related notes
- Redis high-performance cache
- Redis combat
- Redis design and implementation
6. Commonly used technologies and case studies of distributed systems (PDF)
This PDF is divided into three parts, namely the basic theory of distributed systems, common technologies of distributed systems, and classic case studies of distributed systems.
-
The first part mainly introduces the basic theoretical knowledge of distributed systems, and summarizes some paradigms, knowledge points and possible problems that need to be considered when designing distributed systems, including threads, communication, consistency, fault tolerance, CAP theory, and security And concurrency and other related content; at the same time, it describes the common architecture of distributed systems, including the recently popular RESTful style architecture, microservices, container technology, etc.
-
The second part mainly lists some mainstream technologies frequently used in distributed system applications, and introduces the functions and usage of these technologies; these technologies cover distributed message services, distributed computing, distributed storage, and distributed monitoring systems , Distributed version control, RESTful, microservices, containers and other fields.
-
The third part selects large-scale distributed system cases of well-known Internet companies at home and abroad, represented by Taobao and Twitter, and analyzes its architecture design and evolution process; this part is equivalent to making a "skewer" on the scattered technical points of the second part ", so that readers can combine technical theory and see the effect of actual combat.
to sum up
Every programmer friend has his own architect dream, but dreams are often beautiful, and reality is extremely cruel. If you don't work hard or struggle, you may stop at the grassroots for a lifetime. Maybe a lot of friends will say that programmers are also young people, and they can't climb up when they are old, and their brains and bodies can't keep up. That's the case, why not take advantage of yourself while you are still young, seize the opportunity, work hard, and the bright future will have the opportunity to wave with you! Of course, this only represents my personal opinion. After all, if a hundred people have a hundred different minds, there will be thousands of different ideas.
However, just one sentence, if you are still in this line, you are still a programmer (yuan), you who want to go uphill, maybe my Alibaba million annual salary must-have-high-performance architecture learning route may be for you Helped.