Alibaba Cloud MSE helps CAMS realize the service challenges behind the high growth of business

Camax New Energy Technology Co., Ltd. was established on May 16, 2019. The current joint venture shareholders are Volkswagen (China) Investment Co., Ltd., China FAW Co., Ltd., and FAW-Volkswagen Co., Ltd. [capital increase and share expansion will be Completed after obtaining appropriate supervisory (including antitrust) approvals], Wanbang Digital Energy Co., Ltd. and Anhui Jianghuai Automobile Group Holding Co., Ltd., headquartered in Changzhou, Jiangsu. Combining the advantages of car companies and charging companies, CAMS provides services ranging from R&D and manufacturing of charging infrastructure to intelligent interconnection of software, from private charging users to semi-public, public and business users, from the industry source of power supply to the service platform. The terminal experience realizes the seamless connection between the front and back ends of each business format.

Cammax is here for the new generation of Chinese consumers. It not only pays attention to the charging experience of private electric car owners, but also provides users with a convenient, worry-free, intelligent and efficient new charging experience with high-end quality services, starting a journey of enjoying life. At the same time, CAMS is committed to providing full-scenario charging services for electric travel. Relying on strong R&D capabilities, advanced core technologies and high-quality services, it has also won many domestic awards in the field of new energy vehicle charging: in 2021, CAMAX Won the "Best Operation Service Innovation Award in China's Charging Pile Industry"; in March 2023, CAMS won the "High-Quality Charging Five-Star Station Award" in one fell swoop, becoming the first batch of excellent charging operators to receive five-star ratings ( The five-star level is the highest level and the highest standard station); in June of the same year, CAMS won the 2023 Top Ten Influential Operator Brand Award in China's charging and swapping industry. CAMAX will continue to promote the optimization and innovation of charging network construction speed and charging user journey, and will focus on the research and development of high-power charging equipment and the exploration of new energy services, so as to promote the green development of deep integration of new energy and new energy vehicles.

Business stability is a big challenge

In 2023, CAMS will continue to be committed to user-centered integrated innovation to facilitate smart electric travel. As of the end of July this year, the CAMS charging network covered 192 domestic cities, built 1,274 charging stations and 11,113 charging terminals, and accumulated more than 2.41 million users. From lagging behind in construction to "moderately leading", the charging pile industry will usher in great development in the next three years, with a market scale of hundreds of billions. Nowadays, many cities across the country are constantly upgrading and increasing the installation and utilization of charging piles. With the development of new energy vehicles, the demands of charging user groups are growing rapidly. With the rapid growth of business, Kaimax has stabilized its structure. Performance and usability also present unprecedented challenges.

Camis adopts the traditional SpringBoot method for application development, and applications are interconnected through HTTP requests. It is the simplicity of the SpringBoot architecture that effectively helps Camis' business and the number of microservices to rapidly expand. However, as the scale of microservices increases, it is gradually discovered that there are some stability and efficiency problems in the various stages of application, such as release and operation. Kamax architecture students also realize that it is necessary to introduce microservice governance capabilities to properly manage the current microservices, thereby further improving the stability of the business. Similarly, the business still faces demands for rapid development. If the original SpringBoot framework is upgraded to Spring Cloud and various high-level service governance capabilities are introduced, the cost will be too high for Kamax R&D students.

Upgrade the architecture without changing the code

Is there a way to realize the governance capabilities of our microservices without changing the code? For example, by implementing full-link grayscale release to avoid stability risks caused by changes; by using current limiting and downgrading capabilities to ensure the stability of the operating state and solving stability risks caused by uncertain traffic; by using authentication capabilities to solve microservices Security risks of intermittent calls. This is like, how can we improve the performance of the aircraft by replacing the engine while the aircraft is running at high speed? More importantly, for the passengers on our plane, it should be insensitive.

We further abstract the problem and ask how we can achieve the service governance capabilities of any Java application without changing the code. In this process, we need to ensure a series of realistic factors such as stability, problem diagnosis efficiency, architectural sustainability, and performance.

The exploration of technology always serves the business. We conducted a further discussion around the solution of CAMS, whether the problem of non-intrusive service governance for users can be solved through the solution of ServiceMesh.

  1. The mainstream distributed Sidecar mode has been favored by everyone in recent years, but problems have gradually been exposed during use. The Sidecar mode is relatively controllable in terms of memory consumption, at most in the order of MB, but in CPU utilization In terms of rate, with the growth of business throughput, Sidecar's CPU consumption has basically reached the same level as business consumption, which is equivalent to using twice the number of clusters to carry the same business scale after using Sidecar. Generally speaking, the industry has gradually become aware of this problem and has gradually evolved other solutions to achieve non-intrusive traffic routing through a centralized approach.
    1. On the other hand, the introduction of Envoy Sidecar will increase unnecessary operation and maintenance costs for CAMS, and the efficiency of problem diagnosis will also increase significantly. At the same time, the technical complexity of introducing ServiceMesh is also very high for business R&D students threshold.
  2. Since the ServiceMesh solution has a relatively high threshold for users, can Higress be used to implement the governance demands of inter-service calls? Just reveal the operation interface of the gateway. Higress based on hosting provides a new idea for non-intrusive service governance. While meeting the needs of user service governance, it is less resource utilization and more complex in operation and maintenance than Sidecar. It has advantages in terms of speed, performance and latency.

picture

How to implement traffic forwarding and management between services

Now that the idea is finalized and everyone has evaluated the stability, security and cost, then quickly start the practice and exploration of the solution. The first problem we face is the original way of calling K8s Service through the domain name. How do we forward the traffic to Higress and then forward it to the real corresponding Pod through Higress? And in this process we need to consider the stability of the scheme.

  • The way that comes to mind directly is to modify the Service and Endpoints configuration in K8s, and use the coreDNS capability to forward traffic to Higress.
apiVersion: v1
kind: Service
metadata:
 name: provider
spec:
  type: ClusterIP
  clusterIP: None
---
apiVersion: v1
kind: Endpoints
metadata:
  name: provider
spec:
  subsetS:
    ip: ${higress-slb}
    port: 80
  • For the sake of commercial stability, CoreDNS can be replaced by the same type of product privatelinkZone DNS. At the same time, CNAME type DNS records can be configured to batch switch the domain name *.camsnet.com accessed between services to the cloud native gateway.

So far, we have completed that the Order traffic is forwarded to the internal gateway Higress. Next, we need to configure the Higress routing rules to forward the traffic to the real target service.

picture

  • We synchronize the service of the container service to the gateway in the MSE cloud native gateway (Higress commercial version), and configure the corresponding routing rules to implement traffic forwarding.

After the traffic is forwarded by the MSE cloud native gateway, we can do more governance capabilities

  • In this process, we can directly configure the ability of label routing to realize grayscale release, and then combine link tracking to realize the ability of full link grayscale.
  • In this process, we can configure JWT authentication rules on the route to achieve safe calls between services.

How to achieve observability and full-link tracking

By accessing the application real-time monitoring service ARMS - Application Monitoring , Kaimax can realize the monitoring and diagnosis capabilities of the application without modifying a line of code. It can quickly understand the three most critical indicators of the application: response time, throughput, and error rate. At the same time, according to Abnormal indicators use the call chain capability to quickly track the entire microservice.

At the same time, the link tracking capability also provides a technical base support for the application to realize the full link gray scale.

How to achieve full-link traffic label transparent transmission

Use the Tracing Baggage mechanism to transmit the corresponding dyeing identifier throughout the entire link, because most Tracing frameworks support baggage concepts and capabilities, such as: OpenTelemetry, Skywalking, Jaeger, etc. Of course, the ARMS Tracing capability also conforms to this standard. We implement the Higress WASM plug-in and read the x-mse-tag corresponding to the specified transparent transmission key such as x-mse-tag in the Higress outbound Filter from the Baggage at the specified location of the Tracing protocol. The value is inserted into the Http Header for routing by Higress. In this way, the ability of full-link transparent transmission of custom tags can be realized.

picture

After we have the ability to transparently transmit the full link of custom tags, we can build a complete full link grayscale capability. What is full-link grayscale?

Under the microservice architecture, there are some requirements developed, involving simultaneous changes to multiple microservices on the microservice call link. Usually each microservice will have a grayscale environment or group to accept grayscale traffic. We hope to use Traffic entering the upstream grayscale environment can also enter the downstream grayscale environment, ensuring that a request is always delivered in the grayscale environment. Even if there are some microservices on this call link that do not have a grayscale environment, these applications request downstream Time can still return to the grayscale environment. If a release involves multiple microservices in the link, we can smoothly perform the full-link grayscale release without worrying about the risk of grayscale traffic flowing randomly.

After we implement the full-link transparent transmission of the x-mse-tag label, we can configure label routing rules based on x-mse-tag on the Higress routing to achieve a closed loop of traffic with specific labels in the node that applies a specific version. , so as to realize the full-link grayscale capability of the "traffic swim lane".

picture

How to implement traffic protection capabilities

How can we achieve traffic protection without modifying the code? Taking the common flow control and circuit breaker degradation as examples, let's first introduce the flow protection capabilities.

  • flow control

picture

Traffic is very random and unpredictable. One second may be calm, and the next second there may be a traffic flood (such as the scene at midnight on Double Eleven). Each system and service has an upper limit on the capacity it can carry. If the sudden traffic exceeds the system's capacity, it may cause the request to be processed, the accumulated request processing is slow, the CPU/Load soars, and finally leads to system breakdown. Therefore, we need to limit this sudden traffic and ensure that the service is not overwhelmed while processing requests as much as possible. This is flow control.

  • circuit breaker downgrade

picture

Modern microservice architectures are distributed and consist of many services. Different services call each other to form a complex call chain. The above problems will have a magnified effect in the link call. If a certain ring on a complex link is unstable, it may be cascaded layer by layer, eventually causing the entire link to be unavailable. Therefore, we need to circuit break and downgrade unstable weakly dependent services and temporarily cut off unstable calls to avoid local instability factors causing an overall avalanche.

By accessing the MSE service management traffic protection capability (Sentinel Enterprise Edition), CAMS seamlessly realizes the traffic protection capability. Compared with the community version, Sentinel Enterprise Edition has certain advantages in terms of use and function.

picture

More exploration and practice

Without changing the code, we can quickly have complete and systematic microservice governance capabilities. At present, Kamax has implemented a series of capabilities such as full-link grayscale, full-link tracking and observability, and traffic protection based on Higress, allowing Kamax's current architecture to more calmly face the challenges brought by rapidly growing business.

On the other hand, for Higress, the implementation of the Kamax solution has injected fresh ideas into the development of the Higress ecosystem. We are also continuing to improve the ease of use and stability of Higress, hoping to bring more benefits to more companies. great value.

Guess you like

Origin blog.csdn.net/alisystemsoftware/article/details/132538546