Microservice Architecture Technology Stack Selection Manual

I. Introduction

2014 can be regarded as the first year of Microservices 1.0. There were several landmark events in that year. First, Martin Fowler published the article "Microservices" on his blog, officially proposing the microservice architecture style; Large-scale production verification, and finally abstracted to form a set of open source microservice basic components, collectively referred to as NetflixOSS, Netflix's successful experience began to be recognized and respected by the industry; third, Pivotal integrated NetflixOSS open source microservice components into its Spring system and launched Spring Cloud Microservice development technology stack.

In the past three years , the microservice technology ecology has undergone tremendous changes . New technologies and new concepts such as containers, PaaS, Cloud Native, gRPC, ServiceMesh, Serverless, etc., you can sing and I will appear, and before you know it, we have come to the era of microservices 2.0. .

Based on the practical experience in microservice infrastructure in recent years and the accumulation of learning in peacetime, I would like to summarize and put forward some selection ideas for building the microservice 2.0 technology stack for the reference of architects and engineers who are on the front line. For some microservice support modules that have not yet matured open source products, I will also give some custom and self-developed design ideas.

2. Selection criteria

For technology selection, I personally have many criteria, of which the following three are the most important:

1. Production grade

The technology stack we choose is to solve actual business problems and prevent traffic flow in production (careless selection may cause production-level accidents), rather than simply making a POC or Demo display, so production-level (Production Ready), operation and maintenance ( Ops Ready), manageable, mature and stable technology is our first choice;

2. Landed products of first-tier Internet companies

We will try our best to use products that are open source and open source in the first-line Internet companies, and have formed a good reputation in the community. They have been impacted by traffic in these companies, and the pits have been basically filled, and they have been accepted by the community to form a good community ecology ( The appendix section of this article gives GitHub links to all recommended open source projects for use or reference).

3. Open source community activity

The number of stars on GitHub is an important indicator, and it will also refer to the update frequency of its code and documentation (especially in recent years). These indicators directly reflect the community activity or vitality of open source products.

In addition, for companies with different business volumes and team sizes, the technology selection criteria are often different, and the technology selection criteria for startup companies and BAT-level companies may be completely different. This article is mainly aimed at companies with a daily traffic of more than 10 million people and a R&D team of no less than 50 people. If it is smaller than this size, I suggest to seriously evaluate whether it is really necessary to adopt a microservice architecture. Considering the popularity of the Java language in China and my personal background experience, this article is mainly aimed at enterprises that adopt the Java technology stack. This article also assumes self-built micro-service infrastructure. Some products actually have corresponding cloud services that can be used directly. Self-built and adopting cloud services have their own advantages and disadvantages. Architects need to comprehensively weigh them according to the context of the scenario.

The editor here strongly recommends Yang Bo's video whiteboard course on microservice architecture on "Geek Time App". He uses a picture and 6 minutes to explain the key concepts in microservice architecture in simple terms. It's the first season, and there's a second season to follow. Interested users, please click the link at the end of the article to read the original text for details.

3. Key Points of Microservice Infrastructure

The seven modules marked in mango color in the brain diagram below, I think are the core modules for building the microservice 2.0 technology stack, and the selections later in this article will be based on these modules. For each module I also list some core architectural concerns that need to be covered as much as possible when choosing a specific product.


The following picture is a microservice technology system that I have summarized and referenced in my recent work. I would like to share it with front-line architects or engineers for reference. The modules marked in pink are the modules most closely related to microservices. Everyone is doing technology selection. can be compared to this system at the same time.


Fourth, the service framework selection

Service frameworks are a relatively mature field with too many options. Spring Boot/Cloud [Appendix 12.1] Due to the influence of the Spring community and the endorsement of Netflix, it can currently be considered a community standard for building Java microservices. Spring Boot currently has more than 20k stars on GitHub.

The Spring-based framework can be regarded as a RESTful framework in essence (not an RPC framework). The serialization protocol mainly uses text-based JSON, and the communication protocol is generally based on HTTP. The RESTful framework naturally supports cross-language, and any language can access the call as long as there is an HTTP client, but the client generally needs to parse the payload by itself. At present, the Spring framework also supports the Swagger contract programming model, which can generate strongly typed clients in various languages ​​based on the contract, which greatly facilitates the application access of different language stacks. However, due to the weak contract characteristics of the RESTful framework and the Swagger specification, various generated There are still many pitfalls in the interoperability of language clients.

Dubbo [Appendix 12.2] is the technical crystallization of Alibaba's many years of building production-level distributed microservices. It has very rich service governance capabilities and has great influence in the domestic technical community. Currently, there are more than 16k stars on github. Dubbo is essentially a set of Java-based RPC frameworks. Dangdang Dubbox extends Dubbo's ability to support RESTful interface exposure.

Dubbo is mainly oriented to the Java technology stack. The lack of cross-language support is one of its weaknesses. In addition, because of its rich governance capabilities, this framework is relatively heavy, and the threshold for fully using this framework is relatively high. However, if your enterprise basically invests in On the Java technology stack, choosing Dubbo allows you to stand at a higher starting point in the service framework. Whether it is performance or enterprise-level service governance capabilities, Dubbo has done a great job. Sina Weibo's open source Motan (GitHub 4k stars) is also good, and its functions are similar to Dubbo. It can be considered as a lightweight tailored version of Dubbo.

gRPC [Appendix 12.3] is a new set of RPC frameworks introduced by Google in recent years. Based on the strong contract programming model of protobuf, it can automatically generate clients in various languages ​​and ensure interoperability. Supporting HTTP2 is a highlight of gRPC, and the communication layer performance is greatly improved over HTTP. Protobuf is a high-performance serialization protocol with a long history and good reputation in the community. With Google's endorsement and community influence, gRPC is also relatively popular, with more than 13.4k stars on GitHub.

At present, gRPC is more suitable for internal service calls to each other. It is possible to expose RESTful interfaces to the outside world, but it is troublesome (requiring the cooperation of gRPC Gateway). Therefore, it may be necessary to introduce a second RESTful framework as a supplement for externally exposed API scenarios. In general, gRPC is still relatively new, and the community has not yet formed a consensus on the benefits brought by HTTP2. It is recommended to invest cautiously and do some pilot projects.

Five, runtime support service selection

The runtime support services mainly include three products: service registry, service routing gateway and centralized configuration center.

For the service registry, if the Spring Cloud system is adopted, Eureka [Appendix 12.4] is the best match. Eureka has been verified in Netflix for large-scale production and supports cross-data centers. The client can realize flexible client soft load with Ribbon. Eureka Currently, there are more than 4.7k stars on GitHub; Consul [Appendix 12.5] is also a good choice. It naturally supports cross-data centers, and also supports KV model storage and flexible health check capabilities. Currently, it has more than 11k stars on GitHub.

Service gateway is also a relatively mature field with many options. If the Spring Cloud system is adopted, Zuul [Appendix 12.6] is the best match. Zuul has been verified by large-scale production at Netflix, supports flexible dynamic filter scripting mechanism, and has insufficient asynchronous performance (Netty-based asynchronous Zuul has not been launched for a long time. formal edition). Zuul Gateway currently has over 3.7k stars on github. Kong [Appendix 12.7], an API gateway based on Nginx/OpenResty, is currently popular on github, with more than 14.1k stars. Because of the Nginx kernel, Kong's asynchronous performance is strong. In addition, the lua-based plug-in mechanism is more flexible, and the community plug-ins are also rich, ranging from security to current limiting and fuse. There are also many open source management interfaces that can centrally manage Kong clusters. .

In the configuration center, Spring Cloud comes with Spring Cloud Config [Appendix 12.8] (GitHub 0.75k stars), which I personally think is not production-level, and many governance capabilities are lacking. Small-scale scenarios can be tried. Personally, I recommend Ctrip's Apollo [Appendix 12.9] Configuration Center, which has been verified at the production level on Ctrip. It has high availability, configuration takes effect in real time (push-pull combination), configuration auditing and versioning, and multi-environment multi-cluster support. Production-level features are recommended. Large-scale enterprise adoption that requires governance of configuration centralization. Apollo currently has over 3.4k stars on github.

Six, service monitoring selection

It mainly includes products such as log monitoring, call chain monitoring, Metrics monitoring, health check and alarm notification.

ELK can currently be considered as the standard for log monitoring, with complete functions out of the box. ElasticSearch [Appendix 12.10] currently has more than 28.4k stars on GitHub. Elastalert[Appendix 12.11] (GitHub 4k stars) is Yelp's open source alert notification module for ELK.

The current mainstream of call chain monitoring in the community is Comment CAT [Appendix 12.12] (GitHub 4.3k stars), Twitter previously open sourced Zipkin [Appendix 12.13] (GitHub 7.5k stars) and Naver open source Pinpoint [Appendix 12.14] ( GitHub 5.3k stars). Personally, I recommend Dianping’s open source CAT. There are landing cases in Dianping and many domestic Internet companies. The production-level features and governance capabilities are relatively complete. In addition, CAT has its own alarm module. Below is my previous evaluation form of the three products for reference.

Metrics monitoring mainly relies on the time series database (TSDB), the more mature product is the HBase-based OpenTSDB [Appendix 12.15] open sourced by StumbleUpon (KariosDB based on Cassandra [Appendix 12.16] is also an option, GitHub 1.1k stars, it basically The above is a modified version of OpenTSDB for Cassandra). OpenTSDB has distributed capabilities and can be scaled horizontally, but it is relatively heavy and suitable for medium and large-scale enterprises. OpenTSDB currently has nearly 2.9k stars on GitHub.

OpenTSDB itself does not provide an alarm module. Argus [Appendix 12.17] (GitHub 0.29k star) is an open source OpenTSDB-based unified monitoring and alarm platform of Salesforce. It supports rich alarm functions and flexible alarm configuration, and can be used as a supplement to OpenTSDB alarms. In recent years, some lightweight TSDBs have also appeared, such as InfluxDB [Appendix 12.18] (GitHub 12.4k stars) and Prometheus [Appendix 12.19] (GitHub 14.3k stars). These products have rich reporting capabilities and come with alarm modules, but distributed Insufficient capacity, suitable for small and medium-sized enterprises. Grafana [Appendix 12.20] (GitHub 19.9k stars) is the community standard for Metrics report display.

There are also some general health check and alert products in the community, such as Sensu [Appendix 12.21] (GitHub 2.7k stars), which can monitor various services (such as the health check endpoints exposed by Spring Boot, metrics in time series databases, ELK (error log, etc.) customize a flexible health check (check), and then users can set a flexible alarm notification strategy for the check result. Sensu has landed cases in companies such as Yelp. Other similar products are Esty's open source 411 [Appendix 12.22] (GitHub 0.74k stars) and Zalando's ZMon [Appendix 12.23] (GitHub 0.15k stars), which are products that are implemented in Esty and Zalando respectively, but with custom checks and alarms The use threshold of the configuration is relatively high, and the community is not popular. It is recommended that teams with customized self-development capabilities try it out. The backend of ZMon uses KairosDB storage. If the enterprise has adopted KariosDB as the time series database, ZMon can be considered as the alarm notification module.


Seven, service fault-tolerant selection

For the Java technology stack, Netflix's Hystrix [Appendix 12.24] (github 12.4k stars) encapsulates the capabilities of fusing, isolation, current limiting and downgrading into components, and any dependent calls (databases, services, caches) can be encapsulated in Hystrix Command. It is automatically fault-tolerant after encapsulation. Hystrix originated from Netflix's elastic engineering project, and has been verified by Netflix for large-scale production. It is currently the community standard for fault-tolerant components, with over 12k stars on GitHub. Other language stacks also have simplified versions of components like Hystrix.

Hystrix generally needs to be embedded on the application side or in the framework, and there are certain usage thresholds. For companies that use a centralized reverse proxy (border and internal) for service routing, you can focus on the reverse proxy for circuit breaker current limiting, such as Nginx [Appendix 12.25] (GitHub 5.1k stars) or Kong [Appendix 12.7] (GitHub 11.4k stars) Reverse proxies like these, all of them have plugins that support flexible current limiting and fault tolerance configuration. Zuul gateways can also integrate Hystrix to implement centralized current limiting and fault tolerance at the gateway layer. A centralized reverse proxy requires certain R&D and operation and maintenance capabilities, but it can centrally manage current limiting and fault tolerance, which can simplify the client.

Eight, background service selection

Background services mainly include message system, distributed cache, distributed data access layer and task scheduling system. Background service is a relatively mature field, and many open source products can basically be used out of the box.

For the message system, for scenarios with low reliability requirements such as logs, the Apache top-level project Kafka [Appendix 12.26] (GitHub 7.2k stars) is the standard for the community. For business scenarios with high reliability requirements, Kafka is actually competent, but enterprises need to customize and improve Kafka's monitoring and governance capabilities according to specific scenarios. Allegro's open source hermes [Appendix 12.27] (GitHub 0.3k stars) It is a reference project that encapsulates enterprise-level governance capabilities suitable for business scenarios on the basis of Kafka. Alibaba's open source RocketMQ [Appendix 12.28] (GitHub 3.5k stars) is also a good choice, with more features suitable for business scenarios, and is currently a top-level Apache project. RabbitMQ [Appendix 12.29] (GitHub 3.6k stars) is an old classic MQ, with rich queue features and documentation, slightly weaker performance and distribution capabilities, and optional for small and medium-scale scenarios.

For cache management, if you tend to use the client direct connection mode (personally think that the cache direct connection is simpler and lighter), SohuTv's open source cachecloud [Appendix 12.30] (GitHub 2.5k stars) is a good Redis cache management platform. It provides production-level governance capabilities such as monitoring statistics, one-click startup, automatic failover, online scaling, and automatic operation and maintenance, and its documentation is also rich. If you tend to adopt the middle-layer Proxy mode, Twitter's open source twemproxy [Appendix 12.31] (GitHub 7.5k stars) and CodisLab's open source codis [Appendix 12.32] (GitHub 6.9k stars) are the hottest options in the community.

For the distributed data access layer, if the Java technology stack is used, the open source shardingjdbc [Appendix 12.33] (GitHub 3.5k stars) is a good option. The sub-database sub-table logic is done in the client jdbc driver, and the client directly Connecting to the database is relatively simple and lightweight, and is recommended for small and medium-scale scenarios. If you tend to use the database access middle-tier proxy mode, the community open source sub-database sub-table middleware MyCAT [Appendix 12.34] (GitHub 3.6k stars) evolved from Ali Cobar is a good choice. The proxy mode has high operation and maintenance costs. It is recommended for medium and large-scale scenarios, and teams with certain framework self-development and operation and maintenance capabilities should adopt it.

Task scheduling system, I personally recommend Xu Xueli's open source xxl-job [Appendix 12.35] (GitHub 3.4k stars), which is simple and lightweight to deploy, and is sufficient for most scenarios. Dangdang's open source elastic-job [Appendix 12.36] (GitHub 3.2k stars) is also a good choice, which is more powerful and more complex than xxl-job.

Nine, service security selection

For the security authentication and authorization mechanism of microservices, although there are standard protocols such as OAuth and OpenID connect in the industry, the specific implementation methods of each company are different. Enterprises generally have many special customization requirements, and the entire community has not yet formed a general production level. Out of the box product. There are some open source authorization server products, such as Apereo CAS [Appendix 12.37] (GitHub 3.6k stars), JBoss open source keycloak [Appendix 12.38] (GitHub 1.9 stars), spring cloud security [Appendix 12.39], etc. Most of them are opinionated (a view and practice) product, and at the same time, the product is complicated by supporting too many protocols, and it lacks sufficient flexibility. Personal recommendation is based on OAuth and OpenID connect standards, on the basis of referring to some open source products (such as Mitre open source OpenID-Connect-Java-Spring-Server [Appendix 12.40], GitHub 0.62k stars), custom self-developed lightweight authorization server . Wso2 proposes a reference scheme for microservice security [Appendix 12.45], which is recommended for reference. The key steps of the scheme are as follows:


  1. Use an authorization server that supports the OAuth 2.0 and OpenID Connect standard protocols (personal recommendations are customized and self-developed);

  2. Use API Gateway as a single access entry to achieve unified security governance;

  3. Before accessing the microservice, the customer logs in through the authorization server to obtain the access token, and then sends the access token together with the request to the gateway;

  4. The gateway obtains the access token, verifies the token through the authorization server, and performs token conversion to obtain the JWT token.

  5. The gateway forwards the JWT Token along with the request to the background microservice;

  6. User session information can be stored in the JWT, which can be passed to the backend microservices or between microservices for authentication and authorization purposes;

  7. Each microservice contains a JWT client capable of decrypting the JWT and obtaining user session information within it.

  8. In the whole scheme, the access token is a by reference token, which can be directly exposed on the public network without user information; the JWT token is a by value token, which can contain user information but is not exposed on the public network.

X. Service Deployment Platform Selection

Containers have been embraced by the community as an ideal means of delivering microservices, enabling immutable release patterns. A lightweight container-based service deployment platform mainly includes modules such as container resource scheduling, publishing system, image management, resource management and IAM.

Cluster resource scheduling system : Shields container details, abstracts the entire cluster into a container resource pool, supports on-demand application and release of container resources, and enables automatic failover (fail over) when a physical machine fails. At present, Google's open-source Kubernetes [Appendix 12.41], with Google's endorsement and the strong promotion of the community, has basically formed a market leader position. There are 31.8k stars on GitHub, and the community's activity has far exceeded mesos [Appendix 12.42] ( GitHub 3.5k stars) and swarm and other competing products, so it is recommended to prefer K8s for container resource scheduling. Of course, if your team has enough customization and self-research capabilities and wants to deeply control the underlying scheduling algorithm, you can also do customized self-research based on Mesos.

Image governance : Based on Docker Registry, it encapsulates some lightweight governance functions. VMware's open source harbor[Appendix 12.43] (GitHub 3.5k stars) is a relatively mature enterprise-level product in the current community. Based on Docker Registry, it expands governance capabilities such as permission control, auditing, image synchronization, and management interface, and can be considered for adoption.

Resource governance : Similar to the CMDB idea, in the container cloud environment, enterprises still need to carry out lightweight governance of application app, organization org, container quota and quantity and other related information. At present, there is no production-level open source product for this product. Generally, enterprises need to customize their own research according to their own scenarios.

Publishing platform : A user-oriented publishing management console that supports publishing process orchestration. It interfaces and interacts with other subsystems to achieve basic application publishing capabilities, as well as advanced publishing mechanisms such as blue-green, canary and grayscale. At present, there are very few production-level open source products. Netflix's open source spinnaker [Appendix 12.44] (github 4.2k stars) is one, but this product is relatively complex and heavy (because it not only supports adaptation to various CI systems, but also To adapt to various public clouds and container clouds, making the whole system extremely complicated), it is generally recommended for enterprises to customize self-developed lightweight solutions according to their own scenarios.

IAM : is the abbreviation of identity & access management, which performs identity authentication and security access control for each component of the publishing platform. There are many open source IAM products in the community, the more well-known ones are Apereo CAS (GitHub 3.6k stars), JBoss open source keycloak (GitHub 1.9 stars) and so on. However, these products are generally complex and heavy, and many companies will consider customizing self-developed lightweight solutions considering the flexible connection requirements of various internal systems.

Considering that there is currently no end-to-end production-level solution for the service deployment platform, enterprises generally need to customize the integration. Here is a release system with lightweight governance capabilities for reference:


The simplified publishing process is as follows:

  1. After the application is integrated through CI, the image is generated, and the user pushes the image to the image management center;

  2. Users apply for release in the asset management center, fill in the application, release and quota-related information, and then wait for approval;

  3. The release approval is passed, and the developer releases the application through the release console;

  4. The release system obtains release specification information by querying the asset management center;

  5. The release system sends an instruction to start the container instance to the container cloud;

  6. Container Cloud pulls the image from the image management center and starts the container;

  7. After the service in the container is started, it self-registers with the service registry and maintains a regular heartbeat;

  8. The user allocates traffic through the publishing system call service registration center to achieve blue-green, canary or grayscale publishing and other mechanisms;

  9. The gateway and the internal microservice client periodically synchronize the service routing table on the service registry, and distribute traffic to new service instances according to the load balancing strategy.

In addition, the continuous delivery pipeline (CD Pipeline) is also an important link in the release of microservices, which is mainly related to the R&D process and generally requires enterprise customization. Governance processes, for example, only images that pass the test environment test can be upgraded and released to the UAT environment, and only images that pass the UAT environment test can be upgraded and released to the production environment. By setting some quality gates on the assembly line, the application can be delivered to production with high quality.


Eleven, write at the end

Note that this article is limited in space and does not cover testing and CI, but they are also important links in building a microservice architecture, and there are many mature open source products to choose from.

Although technology selection is important, it is only a small part of the construction of microservices. The selected products must be truly implemented within the enterprise to form a complete microservice technology stack system. In the future, there will be a lot of integration, customization, governance, Operation and maintenance and promotion.

This article is only from the perspective of personal experience, and the selection ideas are for reference only. The specific context (business scenario, team organization, technical architecture, etc.) of each enterprise is different, and the background experience of each architect is also different. Everyone has to make the selection based on the actual situation. There is no best technology stack. There are only relatively suitable technology stacks. In addition, good technology selection is based on mutual reference or even PK. You are welcome to discuss and give your own microservice 2.0 technology stack selection opinions.

12. Appendix Links

  1. Spring Boot  https://github.com/spring-projects/spring-boot

  2. Alibaba Dubbo  https://github.com/alibaba/dubbo

  3. Google gRPC  https://github.com/grpc/grpc

  4. NetflixOSS Eureka  https://github.com/Netflix/eureka

  5. Hashicorp Consul   https://github.com/hashicorp/consul

  6. NetflixOSS Zuul  https://github.com/Netflix/zuul

  7. Kong  https://github.com/Kong/kong

  8. Spring Cloud Config  https://github.com/spring-cloud/spring-cloud-config

  9. CTrip Apollo  https://github.com/ctripcorp/apollo

  10. ElasticSearch  https://github.com/elastic/elasticsearch

  11. Yelp Elastalert  https://github.com/Yelp/elastalert

  12. Dianping CAT   https://github.com/dianping/cat

  13. Zipkin   https://github.com/openzipkin/zipkin

  14. Naver Pinpoint  https://github.com/naver/pinpoint

  15. OpenTSDB  https://github.com/OpenTSDB/opentsdb

  16. KairosDB   https://github.com/kairosdb/kairosdb

  17. Argus  https://github.com/salesforce/Argus

  18. InfluxDB  https://github.com/influxdata/influxdb

  19. Prometheus  https://github.com/prometheus/prometheus

  20. G rafana   https://github.com/grafana/grafana

  21. Sensu   https://github.com/sensu/sensu

  22. Esty 411  https://github.com/etsy/411

  23. Zalando ZMon   https://github.com/zalando/zmon

  24. NetflixOSS Hystrix  https://github.com/Netflix/Hystrix

  25. Nginx  https://github.com/nginx/nginx

  26. Apache Kafka   https://github.com/apache/kafka

  27. Allegro Hermes  https://github.com/allegro/hermes

  28. Apache Rocketmq  https://github.com/apache/rocketmq

  29. Rabbitmq  https://github.com/rabbitmq/rabbitmq-server

  30. Sohutv CacheCloud   https://github.com/sohutv/cachecloud

  31. Twitter twemproxy  https://github.com/twitter/twemproxy

  32. CodisLab codis   https://github.com/CodisLabs/codis

  33. Dangdang Sharding-jdbc  https://github.com/shardingjdbc/sharding-jdbc

  34. MyCAT  https://github.com/MyCATApache/Mycat-Server

  35. Xxl-job  https://github.com/xuxueli/xxl-job

  36. Dangdang elastic-job  https://github.com/elasticjob/elastic-job-lite

  37. Apereo CAS   https://github.com/apereo/cas

  38. JBoss keycloak  https://github.com/keycloak/keycloak

  39. Spring cloud security  https://github.com/spring-cloud/spring-cloud-security

  40. OpenID-Connect-Java-Spring-Server  https://github.com/mitreid-connect/OpenID-Connect-Java-Spring-Server

  41. G oogle Kubernetes   https://github.com/kubernetes/kubernetes

  42. Apache Mesos   https://github.com/apache/mesos

  43. Vmware Harbor  https://github.com/vmware/harbor

  44. Netflix Spinnaker  https://github.com/spinnaker/spinnaker

  45. Microservices in Practice – Key Architecture Concepts of an MSA  https://wso2.com/whitepapers/microservices-in-practice-key-architectural-concepts-of-an-msa/

Guess you like