Mastering Kubernetes: Best Practices for High Availability

Establishing a reliable and highly available distributed system is a grand task. This section will introduce some best practice use cases that enable Kubernetes-based systems to work reliably and to deal with various types of failures.

4.2.1 Create a high availability cluster

To create a highly available Kubernetes cluster, the main components must be redundant. This means that etcd must be deployed in the form of a cluster (usually a cluster that spans 3 or 5 nodes), and the Kubernetes API server must be redundant. If necessary, auxiliary cluster management services such as Heapster storage can also be deployed redundantly. Figure 4.1 shows a typical reliable and highly available Kubernetes cluster, which contains several load-balanced master nodes, and each master node contains complete main components and etcd components.

This is not the only way to configure a high-availability cluster. For example, deploy a singleton etcd cluster to optimize the workload of the machine or configure the etcd cluster with more redundancy than other master nodes.

Mastering Kubernetes: Best Practices for High Availability

 

Figure 4.1 Reliable and highly available Kubernetes cluster

4.2.2 Ensure node reliability

Nodes or certain components may fail, but many failures are temporary. Some basic guarantees can ensure that the Docker background support service and Kubelet automatically restart in the event of a failure.

If you are running CoreOS, modern Debian-based OS (Ubuntu ≥ 16.04 version) or any other operating system that uses Systemd as its initialization mechanism, it is easy to deploy Docker and Kubelet as self-starting background support services. The code is shown below.

systemctl enable docker
systemctl enable kublet

For other operating systems, the Kubernetes project provides Monit for its high availability use cases, and readers can choose any process monitor according to their needs.

4.2.3 Protecting the cluster status

The Kubernetes cluster state is stored in the etcd cluster, which is designed to be very reliable and distributed on multiple nodes. It is very important to apply these capabilities to a reliable and highly available Kubernetes cluster.

1. etcd cluster

There should be at least 3 nodes in the etcd cluster. If readers need greater reliability and redundancy, they can use 5, 7, or any other odd number of nodes. Considering the case of network split, the number of nodes must be odd.

In order to create a cluster, etcd nodes should be able to discover each other. There are several ways to do this.

2. Static discovery

Through static discovery, you can directly manage the IP address/hostname of each etcd. This does not mean that the etcd cluster can be managed outside the Kubernetes cluster, or responsible for the healthy operation of the etcd cluster. The etcd node will run as a Pod and restart automatically when needed.

Assuming that the etcd cluster contains 3 nodes, the code is as follows.

etcd-1 10.0.0.1
etcd-2 10.0.0.2

etcd-2 10.0.0.3

Each node will receive this initial cluster information as command line information, the code is shown below.

--initial-cluster etcd-1=http://10.0.0.1:2380,etcd-
2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380
--initial-cluster-state new

Or, receive it as an environment variable, the code is shown below.

ETCD_INITIAL_CLUSTER="etcd-1=http://10.0.0.1:2380,etcd-
2=http://10.0.0.2:2380,etcd-3=http://10.0.0.3:2380"
ETCD_INITIAL_CLUSTER_STATE=new

3. etcd found

With etcd discovery, you can use the existing cluster to let the nodes of the new cluster discover each other. Of course, this requires new cluster nodes to be able to access the existing cluster. If readers are not worried about dependencies and security risks, they can also use the public etcd discovery service on https://discovery.etcd.io.

Readers need to create a discovery token, if necessary, you can specify the cluster size, the default value is 3, the following code is the command that needs to be entered.

$ curl https://discovery.etcd.io/new?size=3
https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de

When using the discovery service, you need to pass the token as a command line parameter. The code is shown below.

--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de

You can also pass it as an environment variable, the code is shown below.

ETCD_DISCOVERY=https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de

It is worth noting that the discovery service is only related to the initial boot of the initial cluster. Once the cluster is started and run with the initial node, a separate protocol is used to add and remove nodes from the running cluster, so that the service will not be discovered to the public etcd Maintain a permanent dependency.

4. DNS discovery

You can also use DNS and establish discovery through SRV records. SRV records can include or do not include TLS. This part of the content is not covered in this book. Readers can find solutions by searching for etcd DNS discovery.

5. etcd.yaml file

According to the ideas found, the command to start the etcd instance on each node is slightly different from the configuration in the etcd.yaml Pod list. The list should be copied and pasted into /etc/kubernetes/manifests of each etcd node.

The following code shows the different parts of the etcd.yaml manifest file.

apiVersion: v1
kind: Pod
metadata:
  name: etcd-server
spec:
  hostNetwork: true
  containers:
  - image: gcr.io/google_containers/etcd:2.0.9
    name: etcd-container

The initial part contains the name of the Pod, specifies the host network it uses, and defines a container named etcd-container. Then, the most critical part is the Docker image used. In the case shown in the following code, it is etcd: 2.0.9, similar to etcd V2.

    command:
    - /usr/local/bin/etcd
    - --name
    - <name>
    - --initial-advertise-peer-urls
    - http://<node ip>:2380
    - --listen-peer-urls
    - http://<node ip>:2380
    - --advertise-client-urls
    - http://<node ip>:4001
    - --listen-client-urls
    - http://127.0.0.1:4001
    - --data-dir
    - /var/etcd/data
    - --discovery
    - <discovery token>

The command section lists the command line parameters required for etcd to operate correctly. In the case shown in the above code, because the etcd discovery mechanism is used, the discovery flag needs to be specified. <name>, <node IP>, and <discovery token> should be replaced with unique names (hostname is a good choice), the IP address of the node, and the previously received discovery token (the same token for all nodes) . `

    ports:
    - containerPort: 2380
      hostPort: 2380
      name: serverport
    - containerPort: 4001
      hostPort: 4001
      name: clientport

In the case shown in the above code, the port section lists the server (2380) and client (4001) ports, and these ports are mapped to the same port on the host.

volumeMounts:
- mountPath: /var/etcd
  name: varetcd
- mountPath: /etc/ssl
  name: etcssl
  readOnly: true
- mountPath: /usr/share/ssl
  name: usrsharessl
  readOnly: true
- mountPath: /var/ssl
  name: varssl
  readOnly: true
- mountPath: /usr/ssl
  name: usrssl
  readOnly: true
- mountPath: /usr/lib/ssl
  name: usrlibssl
  readOnly: true
- mountPath: /usr/local/openssl
  name: usrlocalopenssl
  readOnly: true
- mountPath: /etc/openssl
  name: etcopenssl
  readOnly: true
- mountPath: /etc/pki/tls
  name: etcpkitls
  readOnly: true

The mount section lists the varetcd mounts at /var/etcd, where etcd writes data, and some SSL and TLS read-only mounts that are not modified by etcd. The code is shown below.

volumes:
- hostPath:
    path: /var/etcd/data
  name: varetcd
- hostPath:
    path: /etc/ssl
  name: etcssl
- hostPath:
    path: /usr/share/ssl
  name: usrsharessl
- hostPath:
    path: /var/ssl
  name: varssl
- hostPath:
    path: /usr/ssl
  name: usrssl
- hostPath:
    path: /usr/lib/ssl
  name: usrlibssl
- hostPath:
    path: /usr/local/openssl
  name: usrlocalopenssl
- hostPath:
    path: /etc/openssl
  name: etcopenssl
- hostPath:
    path: /etc/pki/tls
  name: etcpkitls

The volumes section provides a storage volume for each mount mapped to the corresponding host path. Although read-only mounting is feasible, users may want to map varetcd volumes to more robust network storage instead of relying solely on the redundancy of etcd nodes.

6. Verify etcd cluster

Once the etcd cluster is up and running, you can access the etcdctl tool to check the cluster status and health. Kubernetes allows commands to be executed directly in Pod or containers through the exec command (similar to docker exec).

The recommended command code is shown below.

  • etcdctl member list
  • etcdctl cluster-health
  • etcdctl set test ("yeah, it works!")
  • etcdctl get test (should return "yeah, it works!")

7. 7. etcd v2 given etcd v3

At the time of writing this book, Kubernetes 1.4 only supports etcd v2, but etcd v3 has made significant improvements and added many commendable new features, such as the following features.

  • Switching from JSON REST to protobufs gRPC, the performance of the local client has doubled.
  • Support leasing and lengthy key TTL, thereby improving performance.
  • Use gRPC to multiplex multiple Watches on one connection instead of keeping an open connection for each Watch.

etcd v3 has been proven to run on Kubernetes, but it has not been officially supported. This is a big step forward, and this work is in progress. I hope that when readers read this book, v3 will be officially supported by Kubernetes. If not, it is also possible to migrate etcd v2 to etcd v3.

4.2.4 Protect data

Protecting the cluster state and configuration is important, but protecting your own data is even more important. If the cluster state is destroyed, it is usually possible to rebuild the cluster from scratch (although the cluster will not be available during the rebuild). But if the data is damaged or lost, it will get into trouble. The same rules apply to redundancy, but when the Kubernetes cluster state is dynamic, most of the data may not be synchronized (non-dynamic). For example, many historical data are often important and can be backed up and restored; real-time data may be lost, but the entire system may be restored to an earlier version and only temporarily damaged.

4.2.5 Run redundant API server

The API server is stateless and can obtain all necessary data from the etcd cluster. This means that multiple API servers can be run easily without the need to coordinate between them. Once there are multiple API servers running, you can put the load balancer in front of them to make it transparent to users.

4.2.6 Run leadership elections with Kubernetes

Some main components (such as scheduler and controller manager) cannot have multiple instances at the same time. Multiple schedulers attempting to schedule the same Pod to multiple nodes or entering the same node multiple times will cause confusion. The highly scalable Kubernetes cluster allows these components to run in leader election mode. This means that although multiple instances are running, only one instance is active at a time. If it fails, another instance is selected as the leader and replaces it.

Kubernetes supports this mode through the --leader-elect flag. The scheduler and controller manager can be deployed as Pods by copying their respective manifests to /etc/kubernetes/manifests.

The following code is a fragment from the scheduler list, which shows the usage of the flag.

    command:
    - /bin/sh
    - -c
    - /usr/local/bin/kube-scheduler --master=127.0.0.1:8080 --v=2
--leader-elect=true 1>>/var/log/kube-scheduler.log
      2>&1

The following code is a fragment from the controller manager list, which shows the usage of the flag.

  - command:
    - /bin/sh
    - -c
    - /usr/local/bin/kube-controller-manager --master=127.0.0.1:8080
--cluster-name=e2e-test-bburns
       --cluster-cidr=10.245.0.0/16 --allocate-node-cidrs=true --cloud-
provider=gce  --service-account-private-key-file=/srv/kubernetes/server.
key
       --v=2 --leader-elect=true 1>>/var/log/kube-controller-manager.log
2>&1
    image: gcr.io/google_containers/kube-controller-manager:fda24638d51a4
8baa13c35337fcd4793

It should be noted that these components cannot be automatically restarted by Kubernetes like other Pods, because these components are responsible for restarting the Kubernetes components of the failed Pod, so if they fail, they cannot restart themselves. There must be a ready-made component here, ready to be replaced at any time.

Leader election for application

Leader election is also very useful for applications, but it is difficult to implement. Fortunately, Kubernetes has a clever way. It has a documented program to support the leader elections that users apply for through Google's Leader-elector container. The basic concept is to use Kubernetes endpoints to combine resource transformation and annotation. When this container is coupled to the application Pod as an auxiliary tool, users will get the leader election function in a very smooth way.

The following code shows the use of 3 Pods and a selection program called election to run leader-elector.

> kubectl run leader-elector --image=gcr.io/google_containers/leader-
elector:0.4 --replicas=3 -- --election=election –http=0.0.0.0:4040

Later, you will see 3 new Pods appear on the cluster, called leader-elector-xxx, the code is shown below.

> kubectl get pods
NAME                             READY    STATUS    RESTARTS     AGE
echo-3580479493-n66n4            1/1      Running   12           22d
leader-elector-916043122-10wjj   1/1      Running   0            8m
leader-elector-916043122-6tmn4   1/1      Running   0            8m
leader-elector-916043122-vui6f   1/1      Running   0            8m

But which is the master node? The following code demonstrates how to query the election endpoint.

   > kubectl get endpoints election -o json
   {
       "kind": "Endpoints",
       "apiVersion": "v1",
       "metadata": {
           "name": "election",
           "namespace": "default",
           "selfLink": "/api/v1/namespaces/default/endpoints/election",
           "uid": "48ffc442-b451-11e6-9db1-c2777b74ca9d",
           "resourceVersion": "892261",
           "creationTimestamp": "2016-11-27T03:26:29Z",
           "annotations": {
               "control-plane.alpha.kubernetes.io/leader":
   "{\"holderIdentity\":\"leader-elector-916043122-10wjj\",\"leaseDura
   tionSeconds\":10,\"acquireTime\":\"2016-11-27T03:26:29Z\",\"renewTi
   me\":\"2016-11-27T03:38:02Z\",\"leaderTransitions\":0}"
           } 
       },
       "subsets": []
   }

If the above process is relatively difficult, you can read more in metadata.annotations. To facilitate detection, I recommend using the jq program to slice and cut JSON. It is very useful for parsing the output of Kubernetes API or kubectl, the code is shown below.

kubectl get endpoints election -o json | jq -r .metadata.annotations[]
| jq .holderIdentity
"leader-elector-916043122-10wjj"

The following code shows how to delete the leader to prove the validity of the election.

kubectl delete pod leader-elector-916043122-10wjj
pod "leader-elector-916043122-10wjj" deleted

In this way, there is a new leader, the code is shown below.

kubectl get endpoints election -o json | jq -r .metadata.annotations[]
| jq .holderIdentity
"leader-elector-916043122-6tmn4"

The leader can also be found through HTTP, because each leader-elector container exposes the leader through a local web server running on port 4 040.

Kubectl proxy
http http://localhost:8001/api/v1/proxy/namespaces/default/pods/
leader-elector-916043122-vui6f:4040/ | jq .name
"leader-elector-916043122-6tmn4"

The local web server allows the leader-elector container to act as a sidecar container for the main application container in the same Pod. Since the application container shares the same local network as the leader-elector container, it can visit http://localhost:4040 and get the name of the current leader. Only the application container that shares the pod with the selected leader will run the application, and other application containers in other pods will be in a dormant state. If they receive a request, they will forward it to the leader, and some load balancing techniques can automatically send all requests to the current leader.

4.2.7 Make the rehearsal environment highly available

High availability is very important. If you encounter the problem of setting up high availability, it will mean a business case for a highly available system. Therefore, before deploying the cluster to a production environment (unless it is Netflix testing in a production environment), you need to test a reliable and highly available cluster. In addition, in theory, any changes to the cluster may disrupt high availability without disrupting other cluster functions.

By default, users need to test reliability and high availability. The best way is to create a rehearsal environment that can replicate the production environment as much as possible. This may be very expensive. The following shows several solutions to control costs.

  • Ad hoc HA preview environment: Only create a large HA cluster during the HA test.
  • Compressed time: Create meaningful event streams and scenarios in advance, provide input, and simulate scenarios quickly and continuously.
  • Combine HA testing with performance and stress testing: At the end of performance and stress testing, overload the system and see how reliability and high availability configurations handle the load.

4.2.8 Testing high availability

Testing high availability requires planning and a deep understanding of the system. The goal of each test is to reveal the flaws in the system design and/or implementation, and to provide adequate coverage. Passing the test will ensure that the system operates as expected.

In the areas of reliability and high availability, this means finding ways to disrupt the system and recombining them to observe the effects.

This requires the following methods.

  • A comprehensive list of possible failures (including reasonable combinations).
  • For every possible failure, it is necessary to know how the system should respond.
  • A method of inducing failure.
  • A way to observe the reaction of the system.

Each of the above methods is important. Based on past experience, the best method is to perform it incrementally and try to come up with a relatively small number of general fault categories and general responses, rather than an exhaustive, constantly changing list of low-level faults.

For example, the general failure category is that the node is unresponsive; the general response may be to restart the node, and the way to cause the failure may be to stop the node's VM (if it is a VM), and when the node is down, the system is still based on the standard acceptance test. Eventually rise and the system returns to normal. Developers may also want to test many other content, such as whether the problem is recorded, whether related alerts are sent to the appropriate personnel, and whether various statistics and reports have been updated.

It should be noted that sometimes failures cannot be resolved in a single response. For example, in the case of an unresponsive node, if it is a hardware failure, then restarting will not help. In this case, the second response starts to execute, perhaps a new VM is started, configured and connected to the node. In this case, you can't define too broadly, and you may need to create tests for specific types of Pod/role (etcd, Master, Worker, database, monitoring) on ​​the node.

If there are higher requirements, it will take more time than the production environment to set up an appropriate test environment and testing.

The most important point is to be as non-invasive as possible. This means that under ideal circumstances, the production system will not have test features that allow some of its functions to be turned off or configured to run at a reduced test capacity. The reason is that it increases the attack surface of the system and may be accidentally triggered due to configuration errors. Ideally, developers can control the test environment without the need to modify the code or configuration that will be deployed in production. With Kubernetes, it is usually easy to inject Pods and containers with custom test functions. These functions can interact with system components in the production environment, but can never be deployed in production.

This section introduces what is required to have a reliable and highly available cluster (including etcd, API server, scheduler and controller manager), discusses best practices for protecting the cluster itself and data, and pays special attention to the startup environment and testing The problem.

This article is excerpted from Section 4.2 of Chapter 4 of "Mastering Kubernetes".

Mastering Kubernetes: Best Practices for High Availability

 

  • K8s beginner's guide, for K8s1.10, with rich practical cases
  • Help readers master the skills to design and deploy large clusters on various cloud platforms

This book combines theory and practice to comprehensively introduce Kubernetes, an ideal tool for container orchestration. This book has 14 chapters, covering topics including understanding Kubernetes architecture, creating Kubernetes clusters, monitoring, logging and troubleshooting, high availability and reliability, configuring Kubernetes security, restrictions and accounts, using key Kubernetes resources, managing Kubernetes storage, and using Kubernetes runs stateful applications, rolling updates, scalability and quotas, advanced Kubernetes networks, running Kubernetes in cloud platforms and cluster federations, customizing Kubernetes APIs and plugins, operating the Kubernetes package manager and the future of Kubernetes. This book comprehensively considers different environments and use cases, so that readers understand how to create large-scale systems and deploy them on Kubernetes. In the themes of each chapter, readers provide a wealth of practical case analysis, which is fascinating and fascinating.

This book can be used as a practical reference manual for Kubernetes, focusing on designing and managing Kubernetes clusters, and introduces the functions and services provided by Kubernetes in detail for developers and operation and maintenance engineers.

Guess you like

Origin blog.csdn.net/epubit17/article/details/107904048