When someone asks you about the development history of springCloud micro-architecture services, can you answer it fluently?

Preface

Spring Cloud is a cloud application development tool based on Spring Boot. It provides configuration management, service registration, service discovery, circuit breaker, intelligent routing, micro-agent, control bus, global lock, and decision-making campaign in JVM-based cloud application development. , Distributed sessions and cluster state management provide a simple development method.

Overview of system architecture evolution

In the company's business start-up period, the main problem facing is how to become a real idea of the software, at this time the entire software architecture of the system complex and did not do so, for rapid iteration, the entire software system is the "App + background Service” composition, and the back-end service is just to split the application into the Jar package from an engineering perspective. The software system architecture is as follows:

At this time, the functions of the entire software system are relatively simple, with only basic user, order, payment and other functions, and because the business process is not that complicated, these functions are basically coupled together. And with the popularity of apps (the author's company happens to be a hot spot on the Internet), the number of app downloads increased rapidly in 2017, and the number of online registrations also increased.

With the rapid growth of traffic, the pressure on the entire back-end service becomes very large at this time. In order to resist the pressure, it is only necessary to continuously add machines and expand the back-end service nodes in parallel. The deployment architecture at this time is as follows:

In this way, the entire software system withstands a wave of pressure, but the system often occasionally has accidents, especially because the performance of a certain interface in the api makes the entire service unavailable, because these interfaces are all in a JVM process In the middle, although multiple nodes are deployed at this time, because the underlying database and cache system are all one, there will still be a situation of all-in-one.

On the other hand, with the rapid development of business, the functions that were relatively simple in the past have become more complicated. These functions are not only visible to users, but also invisible to many users. Just like Baidu search, users can see the possibilities. It's just a search box, but in fact, there may be hundreds of services corresponding to the background, such as some functions related to growth strategies: red envelopes, sharing and refreshing. There are also monetization functions related to advertising recommendations.

In addition, the growth of traffic/business also means the rapid growth of the number of team members. If everyone develops their own business functions or uses a set of service codes at this time, it is difficult to imagine that hundreds of people will be superimposed on the same project. What kind of scene. Therefore, how to divide the business boundary and rationally configure the team is also a very urgent matter!

In order to solve the above problems and adapt to business and team development, the architecture team decided to split the microservices. To implement the microservice architecture, in addition to a reasonable division of business module boundaries, a complete set of technical solutions is also required.

In the selection of technical solutions, there are many frameworks for service split governance. The early ones are WebService, and the recent ones are various Rpc frameworks (such as Dubbo, Thirft, Grpc) . Spring Cloud is based on a complete set of microservice solutions provided by Springboot. Because the technology stack is relatively new and the support for various components is also very comprehensive, Spring Cloud has become the first choice.

After a series of reconstruction + expansion, the entire system architecture finally formed a set of app-centric microservice software system, the structure is as follows:

At this point, the entire software system has initially completed the split of the microservice system based on SpringCloud. Core functions such as payment, orders, users, and advertisements are separated into independent microservices. At the same time, the databases corresponding to the respective microservices are also split according to service boundaries.

After the service split is completed, the original code call relationship between the functional logic is transformed into the call relationship of the network between services, and each microservice needs to provide corresponding services according to the functions carried by them. service discovery and invocation, it becomes a whole micro system more crucial part of the service, the framework used Dubbo students know, registered in Dubbo in services & Zookeeper discovery is dependent on the implementation, and we are in SpringCloud by Consul to achieve. In addition, in the SpringCloud-based architecture system, a configuration center (ConfigServer) is provided to help each microservice manage configuration files, and the original api service gradually evolved into a pre- gateway service with the separation of various functions .

Talking about this, we have split the microservice based on SpringCloud. In this architecture, we mentioned the key components of Consul, ConfigServer, and gateway service . Then how do these key components support this huge service? What about the system?

SpringCloud key components

Consul

Consul is an open source registry service developed in go language. It has built-in service discovery and registration framework, distributed consistency protocol implementation, health check, Key/Value storage, multiple data centers and other solutions. In the SpringCloud framework, you can also choose Eurke as the registration center. The main reason for choosing Consul here is Consul's support for heterogeneous services, such as grpc services.

In fact, in the subsequent evolution of the system architecture, in the process of further splitting some service modules into subsystems, grpc is used as the calling method between subsystem services. For example, the continued expansion of the payment module has split the microservice architecture of the payment service itself. At this time, the payment microservice uses the grpc method for calling, and the service registration and discovery itself still rely on the same set of Consul Cluster.

The evolution of the system architecture at this time is as follows:

The module services in the original microservice architecture will develop toward an independent system after reaching a certain degree of scale or complexity, so that the entire microservice call link becomes very long, and from the perspective of Consul , All services are flat again.

As the scale of microservices grows larger and larger, Consul, as the core service component of the entire system, is at a key position in the entire system. Once Consul hangs up, all services will stop serving. So what kind of service is Consul? How to design its disaster tolerance mechanism?

To ensure the high availability of Consul services, Consul should be a cluster in the production environment (for the installation and configuration of the Consul cluster, please refer to the network information), there is no doubt. In the Consul cluster, there are two roles: Server and Client . These two roles have nothing to do with the application services running on the Consul cluster. They are just a role division based on the Consul level. In fact, it is the Server node that maintains the state information of the entire Consul cluster. Like the Zookeeper implementation of the registry in Dubbo, each Server node in the Consul cluster also needs to be elected ( using GOSSIP protocol, Raft consensus algorithm, not here. To expand in detail, I can discuss with you separately in the following article ) to elect the Leader node in the entire cluster to be responsible for processing all queries and transactions, and to synchronize state information with other nodes.

The Client role is relatively stateless , just a simple proxy forwarding RPC requests to the Server node. The reason for the existence of the Client node is mainly to share the pressure of the Server node and serve as a buffer. This is mainly because the number of Server nodes should not be too large. More, because the more Server nodes, the slower the process of reaching consensus, and the higher the cost of synchronization between nodes. For Server nodes, 3-5 are generally recommended, while there is no limit on the number of Client nodes, and thousands or tens of thousands can be deployed according to actual conditions. In fact, this is just a strategy. In a real production environment, most applications only need to set up 3 to 5 Server nodes. The node configuration of the Consul cluster in a production cluster of the author's company is 5 A Server node, and no additional Client node is set.

In addition, there is another concept in Consul cluster is Agent . In fact, each Server or Client is a consul agent. It is a daemon running on each member of the Consul cluster. Its main function is to run DNS or HTTP. Interface, and is responsible for checking and keeping service information synchronized at runtime. When we start the nodes (Server or Client) of the Consul cluster, they are all started through the consul agent. E.g:

consul agent -server -bootstrap -syslog \ -ui \-data-dir=/opt/consul/data \-dns-port=53-recursor=10.211.55.3-config-dir=/opt/consul/conf\-pid-file=/opt/consul/run/consul.pid \-client=10.211.55.4\-bind=10.211.55.4\-node=consul-server01 \-disable-host-node-id &

Taking the actual production environment as an example, the deployment structure diagram of the Consul cluster is as follows:

In the actual production case, the Client node is not set, but a cluster composed of 5 Consul Server nodes is used to serve the application registration & discovery of the entire production cluster. There are details to understand here. In fact, the IP addresses of the 5 Consul Server nodes are different. Specific services should be connected to the IP of the Leader node when connecting to the Consul cluster for service registration and query. The problem is, if the Leader node hangs If the corresponding application service node is dropped, how to connect to the new Leader node elected by Raft? Is it impossible to switch IP manually?

Obviously the way to manually switch the IP is not reliable, and in production practice, each node of the Consul cluster is actually running DNS on the Consul Agent (such as the red font in the startup parameters), and the IP of the application service when connecting to the Consul cluster The address is the IP of the DNS, DNS will map the address resolution to the IP corresponding to the Leader node. If the Leader node is down, the elected new Leader node will notify the DNS service of its IP, and the DNS will update the mapping relationship. This process is correct Each application service is transparent.

Through the above analysis, Consul uses cluster design, Raft election algorithm, Gossip protocol and other mechanisms to ensure the stability and high availability of Consul services. If you need a higher level of disaster tolerance, you can also design dual data centers to build two Consul data centers in different places to form a remote disaster recovery Consul service cluster, but the cost will be higher, depending on whether it is true Need it.

ConfigServer (Configuration Center)

The configuration center is a service that manages the configuration of microservice applications, such as database configuration, configuration of some external interface addresses, and so on. In SpringCloud, ConfigServer is an independent service component. Like Consul, it is also a key component in the entire microservice system. All microservice applications need to call their services to obtain configuration information required by the application.

As the scale of microservice applications expands, the access pressure of the entire ConfigServer node will gradually increase. At the same time, there will be more and more various configurations of each microservice. How to manage these configuration files and their update strategies ( To ensure that there is no risk of online failure due to random changes in production configuration) and to build a highly available ConfigServer cluster is also an important aspect to ensure the stability of the microservice system.

In production practice, because of key components such as Consul and ConfigServer, independent clusters need to be built and deployed in physical machines instead of containers. When we introduced Consul in the previous section, we independently built 5 Consul Server nodes. Since ConfigServer is mainly an http configuration file access service, it does not involve operations such as node election and consistent synchronization, so the high-availability configuration center is built in the traditional way. The specific structure diagram is as follows:

We can manage application configuration files through git alone. Normally, ConfigSeever directly pulls the configuration of the git repository through the network for service acquisition, so that as long as the git repository configuration is updated, the configuration center can immediately perceive it. But the instability of this is that git itself is a code management tool for intranet development. If the online real-time service is directly read, it is easy to pull the git warehouse, so we are in the actual operation and maintenance In the process, the version control of the configuration file is carried out through git, which distinguishes the online branch/master and the functional development branch/feature, and after completing the mr, it needs to be manually (triggered by the release platform) to synchronize the configuration. The branch configuration is synchronized to the local path of the host where each configserver node is located, so that the configserver service node can obtain the configuration file through its local directory instead of calling the network multiple times to obtain the configuration file.

On the other hand, with more and more microservices, the number of configuration files in the git repository will also increase. In order to facilitate configuration management, we need to organize the configuration of different application types according to a certain organizational method. In the early days, all applications were not classified, so hundreds of microservice configuration files were placed in a warehouse directory. This led to an increase in configuration file management costs. On the other hand, it would also affect the performance of ConfigServer because of a microservice. Unused configuration will also be loaded by ConfigServer.

Therefore, the later practice is to organize according to the hierarchical relationship of the configuration, abstract the company's global project configuration to the top level, which is loaded by ConfigServer by default, and all other microservices are grouped by application type (grouped by the git project space) , The same applications are placed in a group, and then a separate git repository named config is set up under this group to store the configuration files of related microservices under this group. The hierarchy is as follows:

In this way, the priority of application loading configuration is the order of "local configuration -> common configuration -> group common configuration -> project configuration" . For example, for a service A, parameter A is configured in the default configuration file ("bootstrap.yml/application.yml") of the project project , and parameter B is also configured in the local project configuration "application-production.yml" . At the same time, the configuration file "application.yml/application-production.yml" under the common warehouse in ConfigServer has parameters C and D respectively . At the same time, there is a group called "pay" and the default configuration file "application.yml" "/Application-production.yml" has parameters E and F , and the specific project pay-api has a configuration file "pay-api-production.yml" which covers the values ​​of parameter C and parameter D in the common warehouse. At this time, if the application is started with "spring.profiles.active=production", then the configuration parameters it can obtain (access via link: http://{spring.cloud.config.uri}/pay-api -production.yml) is A, B, C, D, E, F, where the parameter values ​​of C and D are the last values ​​covered in pay-api-production.yml.

For the ConfigServer service itself, it is necessary to match the configuration type according to this organizational method. For example, in the above example, it is assumed that there is also a configuration warehouse for finance, and when the services under the pay group access the configuration center, the finance space is not required ConfigServer does not need to be loaded. Here you need to do some configuration in the ConfigServer service configuration. details as follows:

spring:application:name:@project.artifactId@version:@project.version@build:@buildNumber@branch:@scmBranch@cloud:inetutils:ignoredInterfaces: - docker0config:server: health.enabled: falsegit:uri: /opt/repos/configsearchPaths:'common,{application}'cloneOnStart: truerepos:pay:pattern: pay-*cloneOnStart: trueuri: /opt/repos/example/configsearchPaths:'common,{application}'finance:pattern: finance-*cloneOnStart: trueuri: /opt/repos/finance/configsearchPaths:'common,{application}'

This is achieved by setting the configuration search method in the application.yml local configuration of the ConfigServer service itself.

Gateway service & service fuse & monitoring

Through the contents of the above two subsections, we introduced in relatively detail two key service components based on the SpringCloud system. However, in the microservice architecture system, there are still many key issues that need to be solved. For example, if the application service deploys multiple nodes in Consul, how does the caller achieve load balancing?

Regarding this issue, the traditional architecture solution is implemented through Nginx, but when we introduced Consul earlier, we only mentioned Consul's service registration & discovery, election and other mechanisms, and did not mention how Consul implements load balancing of service calls. . Is it that the application services in the SpringCloud-based microservice system are all provided by a single node, even if multiple service nodes are deployed? In fact, when the service consumer initiates the call through the @EnableFeignClients annotation, and when the service call is made through the @FeignClient ("user") annotation, we have already achieved load balancing. Why? Because this annotation will enable Robbin proxy by default , and Robbin is a component that implements client load balancing. By pulling service node information from Consul, it forwards client call requests to different servers in a polling manner. Nodes to achieve load balancing. And all of this is achieved through code within the consumer process . This load method is hosted on the consumer-side application service, and has a certain code intrusion on the consumer-side. This is one of the reasons why the concept of Service Mesh (service mesh) will appear later. It will not be expanded here. There will be opportunities later. Talk to everyone again.

Another key issue that needs to be solved is the realization of mechanisms such as service fusing and current limiting . SpringCloud provides support for this mechanism by integrating Netflix's Hystrix framework, which is also implemented on the consumer side like the load balancing mechanism. Due to the length of the article, I will not start here, and I will have the opportunity to communicate with you in the following articles.

In addition, there are Zuul components to implement API gateway services and provide routing distribution and filtering-related functions. Other auxiliary components include Sleuth to implement distributed link tracking, Bus to implement message bus, and Dashboard to implement monitoring dashboards. Since the open source community of SpringCloud is relatively active, there are still many new components being continuously integrated, and interested friends can continue to pay attention!

The operation and maintenance form of microservices

Under the microservice architecture, with the massive increase in the number of services, the workload of online deployment and maintenance will become very large, and if the original operation and maintenance model is still used, it will be difficult to meet the needs. At this time, the operation and maintenance team needs to implement the Devops strategy, develop an automated operation and maintenance release platform, open up the product, development, testing, and operation and maintenance processes, and pay attention to the efficiency of research and development.

On the other hand, we also need to promote the containerization (Docker/Docker Swarm/k8s) strategy, so as to quickly scale the service nodes, which is also an inevitable requirement under the microservice system.

The proliferation of microservices

There is also a problem that needs to be paid attention to, that is, how to manage and control microservices in engineering after the implementation of the microservice architecture. Blindly splitting microservices is also not a very reasonable thing, because it will cause the entire service call link to become unfathomable, which will make troubleshooting difficult and waste online resources.

Refactoring problem

In the early transition from monolithic architecture to microservice architecture, refactoring is a very good way, and it is also a very important means to ensure service standardization and rationalization of business system application architecture. However, in general, the rapid development stage means the rapid growth of the team size. How to make the new team something to do in a short time is also a very test of the management level, because if a lot of people are recruited, and they If there is a transitional state of competition between them, there will be a situation that makes the refactoring become somewhat utilitarian, resulting in incomplete refactoring, avoiding the importance of the important thing, and leading to the appearance of a very high microservice architecture, and the business The system is actually quite bad.

In addition, refactoring is an important decision made after a certain stage. It is not only a re-split, but also a re-shaping of the business system. Therefore, the system structure of the application software and the cost of implementing them must be considered. Blindly!

to sum up

The SpringCloud-based microservice architecture system supports the entire system by integrating various open source components, but in terms of load balancing, fuse, and flow control, it is necessary to invade the business process of the service consumer. So many people think this is not a very good thing, so the concept of Service Mesh (service mesh) appears . The basic idea of ​​Service Mesh is to decouple the business system processes through the deployment of independent Proxy on the host. This Proxy is in addition to Responsible for service discovery and load balancing (not requiring separate registration components, such as Consul), but also for functions such as dynamic routing, fault tolerance and current limiting, monitoring metrics, and security logs.

In terms of specific service components, a ServiceMesh standardization working group called Istio is currently supported and promoted by major manufacturers such as Google/IBM . For specific knowledge about Service Mesh, I will communicate with you in the following content. The above is the entire content of this article. As the author is limited, please forgive me!

Guess you like

Origin blog.csdn.net/SQY0809/article/details/109342988