Author | Lingshen

Responsible Editor | Three-day Volley

Head picture | Paid download at Visual China

kubernetes architecture decryption

1. kubernetes architecture design

To understand the architecture of Kubernetes, you must first understand its architecture diagram.

In fact, Kubernetes is a distributed king solution. It is also a distributed system. It consists of a master and multiple node nodes. The master does not process messages, but is mainly responsible for forwarding messages to the node nodes. We really deal with the main node of the business ability, we can learn from the relationship between the nginx master process and the process process, and the master-slave of jenkins is also the master and does not process messages, but distributes them to the ones that have been processed.

The master receives various apis and receives requests. Whether you are through kubectl or through the http api interface, the master receives them and stores all the various messages in etcd, and then distributes the tasks to let the node brother start to do it Live, this is its consistent "landlord style."

2. The sequence diagram of kubernetes creating pod

Let's take a look at the sequence diagram of kubernetes creating pod:

All the interactive processes of kubernetes actually store all the data in etcd.

When the api server processes the request, it records all the data in etcd, and the process ends.

The scheduled task Scheduler also starts the api interface, writes data into etcd, and its mission is completed.

Kubelet is its client command line and also writes data to etcd through the api interface.

In the end, the invocation process of the pod is also linked to the docker container.

The core concept of kubernetes

Systematically combing the core concepts of kubernetes can allow us to understand them more clearly and deeply.

1. master

One node in the kubernetes cluster is the master node. The master node includes kube-controller-manager, kube-apiserver, kube-scheduler, cloud-controller-manager and other components. They are mainly used for dispatching tasks and do not do their own work. Like the landlord’s old wealth, the core secrets are in his own hands.

2. Node

In the kubernetes cluster, in addition to the master node, it is the node node. The node node is the workers at the lowest level.

3. cube-apiserver

Kubernetes API Server: Kubernetes API, the unified entrance of the cluster, the coordinator of each component, provides interface services with HTTP API, all the addition, deletion, modification, and monitoring operations of all object resources are handed over to the APIServer for processing and then submitted to Etcd storage.

All commands for the operation of kubernetes are issued from here. This is like the office of the head of the central government. Only when he has spoken will kubernetes work and operate normally.

4. kube-scheduler

From the meaning of the word scheduler, we can actually guess its general meaning. If you want to create a pod, after they receive the request, they write it directly into etcd. Because it is distributed, everyone's affairs are completed.

But the actual creation of the pod has not yet been done. For example, on which node the pod is created, the load status of different services, usage status, load balancing algorithm, etc. Some other follow-up work all obey the scheduler's scheduling. It plays the role of unified command and arranges resources.

5. kube-controller-manager

What is the main purpose of controller-manager? From the controller, we can know that he provides interfaces, but he also has a manager. The focus is actually on this manager.

When we use docker, we definitely want to dynamically expand and shrink the docker container. Then we need an interface, a container (such as an nginx container), several backup copies, and a unified manager entry for the container. The controller-manager plays this role.

But the controller-manager does not only include the number of copies of the manager, but also includes many other controllers. Now list the controllers included in the Contoller Manager as follows:

Replication Controller
Node Controller
Namespace Controller
Service Controller
EndPoints Controller
Service Account Controller
Persistent Volume Controller
Daemon Set Controller
Deployment Controller
Job Controller
Pod Autoscaler Controller

Their main function is to control the copy, but we split the copy into many modules to control various copies.

6. etcd

Needless to say, this is a KV nosql database used to store all the data in all kubernetes.

7. kubelet

Before learning kubelet, let's look at a picture.

What does kubelet mainly do? In fact, to put it plainly, it is a kubernetes encapsulated client, similar to redis's redis-client, zookeeper's zookeeper-cli, you can run some commands directly on the server to operate kubernetes. In fact, kubelet eventually handed over the command to the "central chief" to help the company's apiserver for malformation processing.

8. kube-proxy

kube-proxy only exists on the node node, and it mainly maintains network rules and four-layer load balancing. In essence, kube-proxy is similar to a reverse proxy. We can regard kube-proxy running on each node as a transparent proxy and LB for the service.

Kube-proxy monitors the service and endpoint information in the apiserver, configures iptables rules, and forwards the request directly to the pod through iptables.

9. pod

Pod is the smallest deployment unit. A Pod consists of one or more containers. The containers in the Pod share storage and network and run on the same Docker host.

Why is this pod designed? Isn't it good to use docker directly? This is the more wise point in the design of kubernetes. Docker is currently the hottest container virtualization technology. What if it ceases to be popular in the coming year, or is it replaced by something else? Is kubernetes kneeling... This is the ingenious design of pod, which can be quickly switched according to different virtualization container technologies to reduce development costs.

Principles of Kubernetes core components

1. RC[controller]

ReplicationController

It is used to ensure that the number of copies of the container application always remains at the user-defined number of copies, that is, if a container exits abnormally, a new pod will be automatically created to replace it, and if an exception occurs, the extra container will also be automatically recycled.

In the new version of kubernetes, it is recommended to use ReplicaSet to replace ReplicationController.

2. RS[controller]

ReplicaSet

There is no essential difference between ReplicaSet and Repliation Controler, but the name is different. And ReplicaSet supports collective selectors. Although ReplicaSet can be used independently, it is generally recommended to use Deployment to automatically manage ReplicaSet, so that there is no need to worry about incompatibility with other mechanisms (for example, ReplicaSet does not support rolling-update but Deployment supports).

Rolling-update rolling update: first create a new version of the pod container, and then delete the old version of the pod container.

3. Deployment

Deployment provides a declarative definition method for pod and ReplicaSet to replace the previous ReplicationController to facilitate application management.

Deployment can not only be updated on a rolling basis, but also can be rolled back. If you find that the service is not available after upgrading to the new version, you can roll back to the previous version.

4. HAP

HPA is the abbreviation of Horizontal Pod AutoScale. Horizontal Pod Autoscaling only applies to Deployment and ReplicaSet. In the V1 version, it only supports expansion based on the CPU utilization of the Pod. In the vlalpha version, it supports expansion and contraction based on memory and user-defined metrics.

5. statefullSet

StatefullSet is to solve the problem of stateful services (corresponding to Deployments and ReplicaSets are designed for stateless services). Its application scenarios include:

1) Stable persistent storage, that is, the same persistent data that can be accessed after the Pod is rescheduled, based on PVC.

2) Stable network logo, and PodName and HostName remain unchanged after Pod rescheduling, based on Headlesss Service (that is, Service without Cluster IP).

3) Orderly deployment and orderly expansion, that is, Pods are in order. When deploying or expanding, they must be carried out in a defined order (that is, from 0 to N-1, all previous Pods must be installed before the next Pod runs). It is Running and Ready status), based on init containers.

4) Orderly shrink and orderly delete (ie from N-1 to 0).

6. DaemonSet

DaemonSet ensures that all or some nodes are tainted. If the pod is not defined to tolerate this taint, then the pod will not be allocated to this node by the scheduler.

Run a copy of Pod on Node. When a Node joins the cluster, a Pod will also be added for them. When a Node is removed from the cluster, these Pods will also be recycled. Deleting DaemonSet will delete all Pods he created, some typical usage of DaemonSet:

1) Run cluster storage daemon, for example, run clustered and ceph on each Node.

2) Run the log collection Daemon on each Node, such as fluentd, logstash.

3) Run the monitoring Daemon on each Node, for example: Prometheus Node Exporter Job is responsible for batch processing tasks, that is, tasks that are executed only once. It ensures that one or more Pods of the batch processing tasks are successfully completed. Cron Job management is based on time jobs, namely : Run only once at a given time, and periodically run at a given time.

7. Volume

Data volume, which shares the data used by the container in the pod.

8. Label

Tags are used to distinguish objects (such as Pod, Service), and key/value pairs exist; each object can have multiple tags, and objects are associated through tags.

Any API object in Kubernetes is identified by Label. The essence of Label is a series of Key/Value key-value pairs, where the key and value are specified by the user.

Label can be attached to various resource objects, such as Node, Pod, Service, RC, etc. A resource object can define any number of Labels, and the same Label can also be added to any number of resource objects.

Label is the basis for the operation of Replication Controller and Service. The two associate Pods running on Node through Label.

We can realize the multi-dimensional resource grouping management function by bundling one or more different Labels to the specified resource object, so as to facilitate the management of resource allocation, scheduling, configuration, etc. flexibly and conveniently. Some commonly used labels are as follows:

1) Version label:

"release":"stable","release":"canary"......

2) Environmental label:

"environment":"dev","environment":"qa","environment":"production"

3) Architecture label:

"tier":"frontend","tier":"backend","tier":"middleware"

4) Partition label:

"partition":"customerA","partition":"customerB"

5) Quality control label:

"track":"daily","track":"weekly"

Label is equivalent to the label we are familiar with. Defining a Label for a certain resource object is equivalent to attaching a label to it. Then you can query and filter resource objects that have certain Labels through the Label Selector. Kubernetes passes In this way, a simple and general object query mechanism similar to SQL is realized.

The important usage scenarios of Label Selector in Kubernetes are as follows: -> The kube-Controller process defines the Label Selector on the resource object RC to filter the number of Pod copies to be monitored, so as to achieve a fully automatic control process where the number of copies always meets the expected setting;- > The kube-proxy process selects the corresponding Pod through the Service's Label Selector, and automatically establishes the request forwarding routing table from each Service to the corresponding Pod, thereby realizing the intelligent load balancing of the Service; -> By defining specific labels for some Nodes , And use the Nodeselector tag scheduling strategy in the Pod definition file, the kuber-scheduler process can realize the characteristics of Pod "directed scheduling".

9. service

Service is an abstract concept. It maps out a designated port in the form of a virtual IP (VIPs), and forwards the request from the proxy client to one of the backend Pods (that is, the endpoint).

Service defines a logical collection of Pod and a strategy for accessing the collection, which is an abstraction of real services. Service provides a unified service access entry, service proxy and discovery mechanism, associates multiple Pods with the same Label, and users do not need to understand how the background Pod runs. Problems with external systems accessing Service:

First of all, we need to understand the problem of the three IPs of Kubernetes

Node IP: The IP address of the Node node
Pod IP: Pod's IP address
Cluster IP: Service IP address

First of all, the Node IP is the IP address of the physical network card of the node in the Kubernetes cluster. All servers belonging to this network can communicate directly through this network. This also means that when nodes outside the Kubernetes cluster access a node or TCP/IP service in the Kubernetes cluster, they must communicate through the Node IP.

Secondly, Pod IP is the IP address of each Pod, which is allocated by Docker Engine according to the IP address segment of the docker0 bridge, which is usually a virtual Layer 2 network.

Finally, Cluster IP is a virtual IP, but more like a fake IP network.

The reasons are as follows:

Cluster IP only acts on the Kubernetes Service object, and is managed and assigned by Kubernetes. The P address -> Cluster IP cannot be pinged. It does not have a "physical network object" to respond -> Cluster IP can only be combined with Service Port to form a specific communication Ports, individual Cluster IPs do not have the basis for communication, and they belong to a closed space like a Kubernetes cluster. -> Within the Kubernetes cluster, the communication between the Node IP network, Pod IP network and Cluster IP network uses a special routing rule designed by Kubernetes in a programming mode.

More reading recommendations

Changes amidst chaos, Cloud Native starts with a "big explosion"
In 2021, where will cloud native go?
The king of distributed architecture? Why Kubernetes
Mendix enters China low-code, are you ready, developers
How does information spread virally? Understand the Gossip protocol in one article
See through the appearance of the container, demonstrate the principle of Linux container implementation

See through these routines, your kubernetes will be more fragrant