[Cloud Native] How does k8s run Container?

quotation

In the previous article Kubernetes components and architecture , we introduced Kubernetes control plane components, Node nodes, and plug-ins. We also knew that Pod is the smallest scheduling unit of Kubernetes, and we use Containers to run in Pods. In this article Kubernetes is finally connected with the Containers like Docker we knew before. We will learn more about Pods in this article:

  • What is Pod
  • Pod Basic Operations
  • Pod Labels
  • Pod Lifecycle
  • Container Features
  • Pod resource limits
  • Init container in Pod
  • Node Affinity Assignment Pod

1. What is Pod

Extract the official website: kubernetes.io/zh-cn/docs/…

1.1 Introduction

Pod is the smallest deployable unit of measurement that can be created and managed in Kubernetes . Pod is translated into pea pods, similar to the fact that there are many peas in pea pods. Pod manages a set of (one | multiple) containers that share storage, network, and declarations on how to run this set of containers. Content in a Pod is always collocated and scheduled together to run in a shared context .

1.2 How does Pod manage multiple containers?

The containers in the Pod are automatically arranged on a physical machine or a virtual machine in the cluster, and are scheduled together.容器之间可以共享资源和负载、彼此通信、协调何时以何种方式终止自身

For example: you may have a Pod with two containers sharing storage, which is a shared volume:

  • One of the containers provides Web service support for shared volumes, that is, read (WebServer in the figure below);
  • Another container is responsible for updating these files remotely (the FilePuller in the figure below pulls data from the data volume)
  • That is, a Pod can run multiple containers that work together

insert image description here

1.3 How to use Pods?

Pod is the smallest scheduling unit in Kubernetes. We usually do not create Pods directly, but create Pods through controllers such as Deployment or Job. However, in the following studies, we will directly create Pods to demonstrate the basics of Pods. Operation, if you need to track the status of the Pod, you can consider the StatefulSet resource.
Because, if you use the Pod directly, you only do a shared resource and network for the Container, and you can’t do other things. That is to say, the Pod wraps the Container to realize the shared resource and network between containers.

There are two main usages of Pods in a Kubernetes cluster:

  • A Pod that runs a single container, in which case the Pod can be thought of as a wrapper around a single container, and Kubernetes manages the Pod directly, not the container
  • A Pod that runs multiple containers working together. Pod can encapsulate multiple co-existing containers that are tightly coupled and need to share resources, and package these containers and storage into a manageable entity, which is actually the picture above.

illustrate:

  • Organizing multiple co-located, co-managed containers into a Pod is a relatively advanced usage scenario, that is, running multiple containers that work together
  • A Pod can be regarded as a single instance of an application. If you want to run multiple instances, you need to run multiple Pods. Of course, running multiple Pods is usually done through the controller

2. Pod basic operations

2.1 View Pods

# 查看指定命名空间的 Pod
kubectl get po|pod|pods -n 命名空间
# 查看所有的 Pod
kubectl get po|pod|pods -A
# 查看 Pod 的输出情况(具体情况)
kubectl get po|pod|pods -o wide
# 监视 Pod 的变化
kubectl get po|pod|pods -w
# 查看默认的命名空间的 Pod
kubectl get po|pod|pods
# 查看指定 Pod 的相信信息
kubectl describe pod pod名称

2.2 Create Pods

Pods can be created directly through the command: pod: kubectl run nginx(Pod name) --image=nginx:1.19 The
disadvantage of this creation method is also obvious. After this command is executed once, it cannot be used, and we can put it Record it as a template, that is, create it through the configuration file provided below:

# nginx-pod.yml
apiVersion: v1
kind: Pod
metaData:
  name: nginx
spec: 
  containers:
    - name: nginx
    - image: nginx:1.19
    - ports:
	    - containerPord: 80

Then create it with one of the following two commands

kubectl create -f nginx-pod.yml

kubectl apply -f nginx-pod.yml

Notice:

  • create is only created when it does not exist, and an error will be reported if it already exists!
  • apply If it does not exist, create it, if it exists, update the configuration, it is recommended to use apply

2.3 Delete pods

# 根据 Pod 的名称进行删除
kubectl delete pod pod名称 
# 根据 配置文件 进行删除,因为配置文件中也有 Pod 的名称
kubectl delete -f pod.yml

2.4 Enter the container in the Pod

# 默认进入 Pod 中的第一个容器
kubectl exec -it nginx(Pod名称) --(固定写死) bash(执行命令)
# 指定进入 Pod 的某个容器
kubectl exec -it nginx(Pod名称) -c 容器名称 --(固定写死) bash(执行命令)

2.5 View Pod logs

# 查看 Pod 中所有容器的日志
kubectl logs -f nginx(pod 名称)
# 查看 Pod 中指定容器的日志
kubectl logs -f nginx(pod 名称) -c 容器名称

2.6 View the description information of the Pod

kubectl describe pod nginx(pod 名称)

3. Pod runs multiple containers

3.1 Create Pods

apiVersion: v1  
kind: Pod  
metadata:  
  name: mypod  
  labels:  
    app: mypod  
spec:  
  containers:
    # 容器一:Nginx:1.19
    - name: nginx  
      image: nginx:1.19  
      imagePullPolicy: IfNotPresent  
      ports:  
        - containerPort: 80  
    # 容器二:Redis:5.0.10
    - name: redis  
      image: redis:5.0.10  
      imagePullPolicy: IfNotPresent  
      ports:  
        - containerPort: 6379
kubectl apply -f my-pod.yml

3.2 View the specified container log

kubectl logs -f mypod -c nginx(容器名称)

kubectl logs -f mypod -c redis(容器名称)

3.3 Access to the container

kubectl exec -it mypod -c nginx -- sh
kubectl exec -it mypod -c redis -- sh

4. Pod Labels

Labels are key-value pairs attached to Kubernetes objects (such as Pods), and are used to specify the identification attributes of objects that are meaningful and relevant to users. The role of labels
: aliases for K8s objects, with which you can filter and filter,
for example : Add regional identifiers to the server in Hangzhou and the server in Guangzhou, so that we can filter out Pods according to the region and make a reasonable allocation of Pods

4.1 Grammar

Labels consist of key-value pairs with the following constraints on values:

  1. Must be 63 characters or less (may be empty)
  2. Begin and end with a number or character unless the tag value is empty
  3. Contains dashes (-), underscores (_), dots (.)

4.2 Basic label operations

# 查看标签
kubectl get pods --show-labels

# 给指定的 Pod 动态添加 标签键值对
kubectl label pod mypod env=prod

# 重写指定 Pod 标签的 键值对
kubectl label --overwrite pod mypod env=test

# 删除标签 `-`表示删除标签
kubectl label pod mypod env-

# 根据标签进行筛选:
# 1. 查看 key=value 的标签
kubectl get pod -l env=test
# 2. 查看键值对中有 key 的标签
kubectl get pod -l env
# 3. 查看键值对中没有 key 的标签
kubectl get pod -l '!env'
# 4. 查看键值对的值为某些值其中的标签
kubectl get pod -l 'env in (test,prod)'
# 5. 查看键值对的值不为某些值其中的标签
kubectl get pod -l 'env not in (test, prod)'

5. Pod life cycle

Pods follow a predefined lifecycle:

  • Starting from the Pending stage, there is at least one container that has not been created in this stage
  • If at least one main container starts normally at this time, enter the Running phase
  • After that, if there is a container with a failed state, it will become Failed, otherwise it will enter the Succeeded stage.

5.1 Life cycle

Previously, in the components and architecture of Kubernetes, we mentioned that the Node node is running Pod. A Pod is created, given a unique ID (UID), and is scheduled to the Node node, and it keeps running until terminated or deleted. Running on this node
If the node dies, the Pods scheduled to this node will also be scheduled to be deleted after the given time limit has elapsed.

That is to say, the Pod itself does not have the self-healing ability and will be deleted due to node failure. It will also fail to survive due to node resource exhaustion or node maintenance. However, Kubernetes will use the controller to ensure the self-healing ability of the Pod. Capability does not mean that it will be rescheduled to another node, because any given Pod will not be rescheduled to another node, only to create a new Pod with the same attributes but different UIDs to be scheduled to another node

Note: If a component declares that it has the same life cycle as the Pod, it will always exist during the existence of the Pod, but when the Pod is deleted for any reason or replaced by the same Pod, this object will be deleted and rebuilt

5.2 Pod stages

stage describe
Pending (hover) The Pod has been accepted by Kubernetes, but one or more containers have not been created. This stage exists in the waiting time for the Pod to be scheduled to a node and the time for downloading the image over the network
Running The pod has been scheduled to a node, all containers have been created, and at least one container exists in the running state
Succeeded All pod containers have successfully run Failed (failure) all pod containers have terminated, and at least one container exited abnormally
Unknown The status of the Pod cannot be obtained for some reason, usually because the Pod failed to communicate with the host

illustrate:

  1. When a Pod is deleted, its state will change to Terminating, but this state is not one of the stages of the Pod. The Pod is given a time limit that can be terminated decently, the default is 30 seconds, and the container can also be forced to close :
    kubectl delete pod <pod_name> --force --grace-period=0
  2. If a node dies or loses contact with other nodes, all Pods on the node will become Failed

6. Container Features

6.1 Container life cycle

Kubernetes will track the state of each container of the Pod, and can trigger events at specific points in the container life cycle according to the container life cycle callback. In fact, Kubernetes adds several hook functions in the life cycle of the container

Once the scheduler assigns a node to the Pod, the kubelet will create a container for the Pod through the Container Runtime, and the container has three states:

  • Waiting (waiting)
    If the container does not exist in the Running or Terminated state, it is in the Waiting state. In this state, the container is still running to complete the operations required for startup, such as pulling images from the mirror warehouse and applying Secret data to the container.

  • Running (running)
    This state indicates that the container is executing, and no problems have occurred. If the postStart callback is configured, the callback will be completed before turning to the Running state, such as determining whether the MySQL service is started before starting the Java application service up

  • Terminated (terminated)
    indicates that the container has started to execute or ended normally or failed. If the preStop callback is configured, the callback will be executed before termination, that is, before entering the Terminated state

6.2 Container lifecycle callbacks/events/hooks

Callbacks enable the container to be aware of events in its management lifecycle and run the code implemented in the handler when the corresponding lifecycle callback is executed. There are two callbacks exposed to the container:

  • PostStart :
    This callback is executed immediately after the container is created, but there is no guarantee that the callback is executed before the container entry point, and no parameters are passed to the handler, that is, the order in which the container is executed and the order in which this event is executed cannot be determined.
    It may not be possible to determine whether the MySQL service is started before the Java application service starts, but this can be achieved through the Init container, which we will talk about later

  • PreStop :
    Before the container is terminated due to an API request or management event, this callback will be called. There is a time limit for stopping Pod, regardless of whether PreStop is executed or not

  • Use the container lifecycle callback instance:

apiVersion: v1  
kind: Pod  
metadata:  
  name: mypod  
  labels:  
    app: mypod  
spec:  
  containers:  
    - name: nginx  
      image: nginx:1.19  
      imagePullPolicy: IfNotPresent  
      lifecycle:  
        postStart:  
          exec:  
            # 写入 start.txt
            command: ["/bin/sh", "-c", "echo postStart >> /start.txt"]  
        preStop:  
          exec:  
            # 写入 stop.txt 并睡眠 5s
            command: ["/bin/sh", "-c", "echo preStop >> /stop.txt && sleep 5"]  
      ports:  
        - containerPort: 80

6.3 Container restart strategy

The spec in the Pod contains a restartPolicy field, which indicates the container restart policy, and its possible values ​​include Always (always restart), OnFailure (restart when the container exits abnormally), and Never (never restart)

It is included in the spec, not the container, because we mentioned earlier that the smallest operating unit of Kubernetes is the Pod, and all containers in the Pod have the same set of declaration rules

Note: The restart strategy of the Pod is not to restart immediately after a failure. When the container in the Pod exits, the kubelet will calculate the restart delay (10s, 20s, 40s...) according to the exponential back-off method. The longest delay is Once a container has executed for 10 minutes without problems, the kubelet will perform a reset operation on the restart fallback timer of the container.

apiVersion: v1  
kind: Pod  
metadata:  
  name: myapp  
  labels:  
    app: myapp  
spec:  
  containers:  
    - name: nginx  
      image: nginx:1.0.19  
      imagePullPolicy: IfNotPresent  
  restartPolicy: Always # OnFailure Never

6.4 Customize the container startup command

Like Docker, K8s can also use command and args to modify the default execution command and pass related parameters when the container starts. Generally, the command is used to modify the startup command, and args is used to pass parameters for the startup command.

apiVersion: v1  
kind: Pod  
metadata:  
  name: redis  
  labels:  
    app: redis  
spec:  
  containers:  
    - name: redis  
      image: redis:5.0.10  
      imagePullPolicy: IfNotPresent  
      command: ["redis-server"]  
      # 开启 AOF 
      args: ["--appendonly yes"]  
  restartPolicy: Always

6.5 Container probes

A probe is a periodic diagnostic performed by the Kubelet on the container. To perform a diagnostic, the Kubelet can either execute code inside the container or issue a network request. It can be understood as regular health checks on containers

  • Probe types
    For running containers, kubelet can choose whether to execute the following three probes, and how to respond to the probe results:

    • livenessProbe indicates whether the container is running. If the liveness detection fails, the Kubelet will kill the container, and the container will determine future operations according to the restart policy.
    • readinessProbe indicates whether the container is ready to service requests. If the readiness probe fails, the endpoint controller removes the Pod's IP address from the list of endpoints that match the Pod.
    • startupProbe 1.7+ indicates whether the application in the container has been started. If a start probe is provided, all other probes will be disabled until this one succeeds, similarly, if the start probe fails, the Kubelet will kill the container
  • Probe Mechanisms
    Using probes to inspect containers has the following four mechanisms to detect containers:

    • Exec
      is judged by executing a Shell command, that is, the container executes the specified command, and if the command returns 0 when it exits, it means success

    • grpc
      executes a remote call using gRPC, the target should implement gRPC health check, if the corresponding status is "SERVING", the diagnosis is considered successful

    • httpGet
      performs an HTTP GET request on the specified port and path on the IP address of the container. If the corresponding status code is greater than or equal to 200 and less than 400, the diagnosis is considered successful

    • tcpSocket
      performs a TCP check on the specified port on the IP address of the container, and it is considered successful if the port can communicate normally

  • Probe Results
    Each probe, each probe, and each mechanism results in three of the following:

    • Success The container passed the diagnostics
    • Failure The container failed the diagnostics
    • Unknown diagnostic failed, so no action will be taken
  • Probe parameters

# 容器开始多少秒之后开始探针
initialDelaySeconds: 5
# 检测间隔时间
periodSeconds: 4
# 默认检测超时时间
timeoutSeconds: 1
# 默认失败次数为 3 次,达到 3 次后重启 Pod
failureThreshold: 3
# 默认成功次数为 1 次,1次检测成功代表成功
successThreshold: 1
  • use probe

    • exec

      apiVersion: v1  
      kind: Pod  
      metadata:  
        name: nginx  
        labels:  
          app: nginx  
      spec:  
        containers:  
          - name: nginx  
            image: nginx:1.19  
            ports:  
              - containerPort: 80  
            args:  
              - /bin/sh  
              - -c  
              - sleep 7 && nginx -g "daemon off;"  
            imagePullPolicy: IfNotPresent  
            livenessProbe:  
              exec:  
                # 查看 Nginx 是否有启动成功
                command:  
                  - ls  
                  - /var/run/nginx.pid  
              initialDelaySeconds: 5  
              periodSeconds: 4  
              timeoutSeconds: 1  
              failureThreshold: 3  
              successThreshold: 1  
        restartPolicy: Always
      

illustrate:

If you sleep for 7s, the first detection fails, but the second detection succeeds, and only after 3 failures will it restart. If you sleep for 30s, there will be more than three detection failures, K8s will kill the container and restart it, and repeat this process
.

  • tcpSocket
apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx  
  labels:  
    app: nginx  
spec:  
  containers:  
    - name: nginx  
      image: nginx:1.19  
      ports:  
        - containerPort: 80  
      args:  
        - /bin/sh  
        - -c  
        - sleep 7 && nginx -g "daemon off;"  
      imagePullPolicy: IfNotPresent  
      livenessProbe:  
        # 查看 Nginx 的 80 端口是否能够正常通信
        tcpSocket:  
          port: 80  
        initialDelaySeconds: 5  
        periodSeconds: 4  
        timeoutSeconds: 1  
        failureThreshold: 3  
        successThreshold: 1  
  restartPolicy: Always
  • httpGet
apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx  
  labels:  
    app: nginx  
spec:  
  containers:  
    - name: nginx  
      image: nginx:1.19  
      ports:  
        - containerPort: 80  
      args:  
        - /bin/sh  
        - -c  
        - sleep 7 && nginx -g "daemon off;"  
      imagePullPolicy: IfNotPresent  
      livenessProbe:  
        # 请求主页 index.html 查看是否能够正常返回 200
        httpGet:  
          port: 80  
          path: /index.html  
        initialDelaySeconds: 5  
        periodSeconds: 4  
        timeoutSeconds: 1  
        failureThreshold: 3  
        successThreshold: 1  
  restartPolicy: Always
  • gRPC

Official website reference address: Configure Survival, Readiness and Startup Probes | Kubernetes

6.6 Resource constraints

If there is no memory limit and CPU limit for the container, the container can use the resources of the current node without limit, so that if the usage rate of a container is too high, it will affect other containers

There are two main restrictions on container resource limitations in Kubernetes:

  • Memory resource limit: Divided into memory request and memory limit, we guarantee that the container has the requested amount of memory, but cannot use more than the limited amount of memory
  • CPU resource limit: also divided into request (request) and limit (limit). A container cannot use more CPU than the configured limit, and if the system has free CPU time, the container allocator is guaranteed the requested amount of CPU resources

request request memory cpu: the basic resources that can be used
limit limit memory cpu: the maximum resource that can be used if the container exceeds the maximum resource, the container will be blocked

6.6.1 metrics-server

Official website address: github.com/kubernetes-…

Kubernetes Metrics Server (Kubernetes Metrics Server), which is a scalable and efficient source of container resource metrics. Metrics Server is used to monitor the load of each Node and Pod, collect resource indicators from Kubelets, and expose it in Kubernetes apiserver through Metrics API, so that we can access it through kubectl top

  • To check whether Metrics-server is running, you can type the following command:
# 查看所有的资源指标 API
kubectl get apiservices
  • If the resource metrics API is available, the output will contain a reference to metrics.k8s.io
NAME
v1betal.metrics.k8s.io

Install metrics-server, using the source on github

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

View the memory usage of the container:

kubectl top pod pod名称 

6.6.2 Specifying memory requests and limits

To specify the memory request for the container, add resources:limits directly to the configuration file. Here we will briefly mention the memory unit: the basic unit of memory is byte (byte), and you can use EPTGMK Ei Pi Ti Gi Gi Mi Ki

apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx-memory  
  labels:  
    app: nginx-memory  
spec:  
  containers:  
    - name: nginx-memory  
      image: nginx:1.19  
      imagePullPolicy: IfNotPresent  
      resources:  
        requests:  
          memory: 100M  
        limits:  
          memory: 200M  
  restartPolicy: Always

View container memory usage (including requests and limits for container memory):

kubectl get pod pod名称 --output=yaml

The purpose of memory requests and limits
By configuring memory requests and limits for containers in the cluster, the memory resources available on the cluster nodes can be effectively utilized, and Pod scheduling can be better arranged by keeping Pod memory requests at a low level. And the benefits of a limit greater than a request are:

  • Pods can have bursts of activity to make better use of available memory
  • Pods can be limited to a reasonable amount of memory during bursts of activity

If no memory limit is specified
, one of the following is automatically followed:

  • The container uses unlimited memory and uses all the available memory of the node, which may lead to OOM Killer
  • If the namespace where the running container is located has a default memory limit, it will be automatically assigned the default limit

If only the memory limit is specified, and no memory request is specified,
Kubernetes will automatically set the request value with the same value as the limit, which is the case in both memory limit and CPU limit

6.6.3 Specifying CPU requests and limits

The CPU request and limit are similar to the memory request and limit. The only difference is that the CPU will not kill after exceeding the limit limit, but will continue to run. Its use is similar to memory:

apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx-memory  
  labels:  
    app: nginx-memory  
spec:  
  containers:  
    - name: nginx-memory  
      image: nginx:1.19  
      imagePullPolicy: IfNotPresent  
      resources:  
        requests:  
          cpu: 100m
        limits:  
          cpu: 200m  
  restartPolicy: Always

It is worth mentioning the unit of CPU: decimal values ​​are available, a container requesting 0.5 CPU will only guarantee to get half of the CPU requesting 1 CPU container, you can use m to represent milli, for example 100m CPU is the same as 0.1 CPU.

7 init container in Pod

Pod may manage many containers. Unlike application containers such as Nginx and Redis, the Init container runs before all application containers are started. Therefore, the Init container can include some utilities and installation scripts that do not exist in the application image. For example: now there
are A Java application, then you can verify whether the MySQL service is started in the Init container, and you can execute some SQL scripts

7.1 init container features

  • They always run to completion, that is, if the Pod's Init container fails, the Kubelet will keep restarting the Init container until it succeeds. If restartPolicy = never, then the Init container failure will set the Pod status to failed
  • There can be multiple Init containers in a Pod, and each Init container must complete before the next one can start
  • Init container does not support lifecycle livenessProbe readinessProbe startupProbe
  • Init containers support all fields and features of application containers, including resource limits, data volumes, security settings

7.2 Using the Init container

In the Pod specification, the parallel position of the containers array used to describe the application container specifies the Init container

apiVersion: v1  
kind: Pod  
metadata:  
  name: init  
  labels:  
    app: init  
spec:  
  containers:  
    - name: init  
      image: busybox:1.28  
      command: ["sh", "-c", "echo The app is running! && sleep 3600"]  
      imagePullPolicy: IfNotPresent  
  initContainers:  
    - name: init-service  
      image: busybox:1.28  
      command: ["sh", "-c", "echo init-service is running! && sleep 5"]  
  
    - name: init-db  
      image: busybox:1.28  
      command: [ "sh", "-c", "echo init-db is running! && sleep 5" ]  
  restartPolicy: Always

View startup details:
insert image description here

8 Node Affinity Assignment Pods

If a web service is placed in a pod, and the log service is also placed in a pod, it is best to put the two pods on a node, so that the communication between some nodes can be reduced, that is to say, in some specific In the scenario, we would like to be able to specify which node some Pods are on:

We can constrain a Pod so that it can only run on a specific node, or give priority to running on a specific node. There are four ways to choose Kubernetes' scheduling of a specified Pod:

  • A nodeSelector recommendation that matches the node label
  • Affinity and anti-affinity
  • nodeName
  • Pod Topology Distribution Constraints

8.1 Add labels to nodes

# 选择一个节点,给它添加一个标签
kubectl label nodes k8s-node1(节点名称) disk=ssd


# 查看节点的标签
kubectl get pods --show-labels

8.2 Assign the pod to the specified node according to the selected node label [nodeSelector]

apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx  
  labels:  
    app: nginx  
spec:  
  containers:  
    - name: nginx  
      image: nginx:1.19  
      imagePullPolicy: IfNotPresent  
  restartPolicy: Always  
  # 选择节点为标签为 ssd 的节点
  # 我们可以给 CPU 处理快的节点加上 cpu=fast cpu=slow 类似的节点
  # 这样就可以在给 Pod 选择节点的时候加上 cpu=fast
  nodeSelector:  
    disk: ssd

8.3 Assign the pod to the specified node [nodeName] according to the node name

apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx  
  labels:  
    app: nginx  
spec:  
  containers:  
    - name: nginx  
      image: nginx:1.19  
      imagePullPolicy: IfNotPresent  
  restartPolicy: Always 
  # 根据节点的名称指派 pod 到 指定节点
  # 这种方式并不推荐使用 
  # 如果找不到 nodeName 会处于一个 Pending 的状态,直至有对应名字的节点被创建
  nodeName: k8s-node1

8.4 Assign pods to specified nodes based on affinity and anti-affinity

Explain that
nodeSelector provides the simplest way to constrain Pods to nodes with specified labels, and affinity and anti-affinity expand the types of constraints you can define. The specific benefits are as follows:

  • Affinity and anti-affinity have stronger language expression ability. For nodeSelector and nodeName, only existing tags can be selected, while affinity and anti-affinity provide stronger control over the selection logic
  • Can indicate that certain rules are soft requirements or preferences, so that Pods can still be scheduled for labels and node names that do not exist
  • Scheduling constraints can be enforced using labels of other Pods running on the node to keep Pods in the same network domain together

The affinity function has two kinds of affinity:

  • Node affinity: Its function is similar to nodeSelector, but it can write expressions and specify soft rules, which means it can express stronger logic
  • Pod affinity: It can calculate the affinity with the Pod on the node. If the Pod on the node1 node is not compatible with the Pod to be scheduled, it may be scheduled to the node2 node.

Node affinity is conceptually similar to nodeSelector. It also constrains which nodes Pods are scheduled to be based on the labels on the nodes. There are two implementations of node affinity:

  • requiredDuringSchedulingIgnoredDuringExecution: It sets the same rules as nodeSelector, which must be satisfied before scheduling can be executed, but its grammatical expression ability is stronger
  • preferredDuringSchedulingIgnoredDuringExecution : It sets weak rules, that is, the scheduler will try to find a node that meets the corresponding rules. If not found, the Pod will not be in the Pending state, and the scheduler will still schedule the Pod Note: In the above types, IgnoredDuringExecution
    means If the node label changes after the pod is scheduled in Kubernetes, the pod will continue to run
apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx  
  labels:  
    app: nginx  
spec:  
  # 亲和性设置  
  affinity:  
    # 节点亲和性  
    nodeAffinity:  
      # 必须满足的标签  
      requiredDuringSchedulingIgnoredDuringExecution:  
        nodeSelectorTerms:  
          - matchExpressions:  
              - key: disk  
                operator: In  
                values:  
                  - ssd  
  containers:  
    - name: nginx  
      image: nginx:1.19  
      imagePullPolicy: IfNotPresent  
  restartPolicy: Always

Note: You can use In, NotIn, Exists, DoesNotExist, Gt, Lt as the operator to achieve affinity and anti-affinity. In fact, it can be seen here that compared to nodeSelector, affinity and anti-affinity logic expression ability is stronger

8.5 Node affinity weight

When using preferredDuringSchedulingIgnoredDuringExecution, the weight weight will be involved. Its value is between 1 - 100. The higher the weight, the higher the priority when selecting

apiVersion: v1  
kind: Pod  
metadata:  
  name: nginx  
  labels:  
    app: nginx  
spec:  
  # 亲和性设置  
  affinity:  
    # 节点亲和性  
    nodeAffinity:  
      preferredDuringSchedulingIgnoredDuringExecution:  
        - preference:  
            matchExpressions:  
              - key: cpu  
                operator: In  
                values:  
                  - fast  
          weight: 80  
        - preference:  
            matchExpressions:  
              - key: disk  
                operator: In  
                values:  
                  - ssd  
          weight: 50  
  containers:  
    - name: nginx  
      image: nginx:1.19  
      imagePullPolicy: IfNotPresent  
  restartPolicy: Always

Explanation: Here, the node with cpu=fast will be selected first, and if there is no node with disk=ssd, the weight of the former is higher than that of the latter

8.6 Affinity, anti-affinity and weight between pods

Similar to node affinity, there are two types of Pod affinity and anti-affinity:

  • requiredDuringSchedulingIgnoredDuringExecution
  • preferredDuringSchedulingIgnoredDuringExecution
    For example, the requiredDuringSchedulingIgnoredDuringExecution affinity can be used to tell the scheduler to place the Pods of two servers in the same cloud provider availability zone because the communication between them is frequent. Use -preferredDuringSchedulingIgnoredDuringExecution anti-affinity to place multiple Pods of the same service in different cloud provider availability zones.

To use affinity between Pods, use the spec.affinity.podAffinity field in the Pod specification:

apiVersion: v1  
kind: Pod  
metadata:  
  name: redis  
  labels:  
    app: redis  
spec:  
  containers:  
    - name: redis  
      image: redis:5.0.10  
      imagePullPolicy: IfNotPresent  
  restartPolicy: Always  
  affinity:  
    podAffinity:  
      requiredDuringSchedulingIgnoredDuringExecution:  
        - topologyKey: BeiJing  
          labelSelector:  
            matchExpressions:  
              - key: app  
                operator: In  
                values:  
                  - redis

Explanation: The constraint of topologyKey appears here. This constraint is that when we schedule Pods to nodes, the label of the node needs to have the key of BeiJing. We can add this key to all the nodes in Beijing and all the nodes in Hangzhou. HangZhou so that we can use node affinity to schedule pods on nodes in the region we want

Guess you like

Origin blog.csdn.net/u011397981/article/details/130705160