NetEase Youdao | REDIS cloud native practice

REDIS cloud native practice

Abstract
This article uses Redis as an example to explain the practice of Youdao infrastructure team on the road of infrastructure containerization. It will mainly focus on declarative management, the working principle of Operator, Container orchestration, master-slave mode, cluster mode, high availability strategy, cluster expansion and contraction, etc.

Directory

  • background
  • Challenges faced
  • Declarative management
  • Operator working principle
  • Container orchestration
  • Master-Slave Mode
    • Master-Slave Topology Diagram
    • Harmonization Principle
  • Cluster mode
    • Cluster topology diagram
    • Reconciliation principle
  • High availability strategy
    • High availability guaranteed by Kubernetes
    • High availability of Redis cluster
  • Monitoring and observation
  • Cluster expansion and contraction
  • Summary and Outlook

background

Redis is a commonly used caching service in business systems. It is often used in traffic peaks, data analysis, points sorting and other scenarios. Through middleware, it can realize decoupling between systems and improve the scalability of the system.

Traditional physical machine deployment middleware requires operation and maintenance personnel to manually build it, takes a long time to start, is not conducive to later maintenance, and cannot meet the needs of rapid business development.

Compared with traditional IT, cloud native can help smooth business migration, rapid development, stable operation and maintenance, significantly reduce technology costs, and save hardware resources.

Cloud-native middleware refers to relying on containerization, service grid, microservices, serverless and other technologies to build scalable infrastructure and continuously deliver basic software for production systems. It improves the application efficiency while maintaining the same functions. Usability and stability.

Under this general trend, Youdao infrastructure team has begun the practice of cloud-native middleware. In addition to Redis introduced in this article, it also includes Elasticsearch, ZooKeeper, etc.

Challenges faced

Using cloud native technology can solve the current problems of slow Redis deployment and low resource utilization. At the same time, containerized Redis clusters also face some challenges:

• How to deploy Redis stateful services in Kubernetes
• How to not affect service availability after container crash;
• How to ensure that Redis is in memory after container restart The data will not be lost;
• How to ensure that slots are migrated without affecting the business when the node is expanded horizontally;
• How to handle the status of the cluster after the pod ip changes.

Declarative management

For a Redis cluster, our expectation is to be able to provide services 24/7 and to be able to repair itself in case of failure. This is exactly the same as the declarative nature of the Kubernetes API.

The so-called "declarative" means that we only need to submit a defined API object to "declare" what the desired state is. The resource objects in Kubernetes can complete the current state without external interference. The desired state transition is the Reconcile process. For example, if we create a Deployment through yaml, Kubernetes will "automatically" create a Pod for it based on the configuration in yaml, and pull the specified storage volume for mounting, as well as a series of other complex requirements.

Therefore, can our Redis cluster use a similar service to complete this process? That is, we need to define such an object and define the process of serving Reconcile. Kubernetes' Operator can just meet this demand. You can simply understand that Operator consists of resource definition and resource controller. After fully understanding the relationship between the cluster and Operator, we designed the overall architecture diagram as follows
Insert image description here
Operator The cluster itself is deployed using Deployment, and ETCD completes the master selection. The upper layer communicates with Kubernetes's Api Server, Controller Manager and other components, and the lower layer continuously reconciles the Redis cluster status.

In sentinel mode, the Redis service uses a set of sentinel clusters, deployed using StatefulSet, and persistent configuration files. Redis server is also deployed using StatefulSet, and the instance of sentinel mode is one master and multiple slaves.

Each shard in cluster mode is deployed using StatefulSet, and the agent is deployed using Deployment. Kubernetes itself is responsible for native Pods, StatefulSets, Services, scheduling strategies, etc.

The resource definition of Redis can be stored in ETCD. We only need to submit the yaml configuration of the custom resource in advance. The following is a Redis master-slave cluster that creates three replicas:


apiVersion: Redis.io/v1beta1
kind: RedisCluster
metadata:
  name: my-release
spec:
  size: 3
  imagePullPolicy: IfNotPresent
  resources:
    limits:
      cpu: 1000m
      memory: 1Gi
    requests:
      cpu: 1000m
      memory: 1Gi
  config:
    maxclients: "10000"

Guess you like

Origin blog.csdn.net/youdaotech/article/details/122081481