Kubernetes Pod concept and network communication mode

Predecessor: Google wrote and developed K8S on Borg through go language, and K8s used HTTP protocol to develop C/S structure

Kubernetes features

  • Lightweight: low resource consumption
  • Open source
  • Elastic scaling
  • Load balancing: IPVS

Knowledge points to master

  • Basic concepts: What is Pod, controller type, K8s network communication mode,

  • Kubernetes installation: build a K8S cluster

  • Resource List: Resources, master the syntax of the resource list, write Pod, and master the life cycle of Pod

  • Pod controller: master the characteristics of various controllers and how to use them

  • Service discovery: master the principle of SVC and its construction method

  • Storage: Master the characteristics of multiple storage types, and be able to choose suitable storage solutions in different environments. (Have my own opinion)

  • Scheduler: Master the principle of scheduler, and be able to define Pod to run on the desired node according to requirements

  • Security: cluster authentication, authentication, access control principles and processes

  • HELM: Just like the Linux yum command, HELM is to deploy clusters, master the principles of HELM, customize HELM templates, and deploy some commonly used plug-ins in HELM

  • Operation and maintenance: modify Kubeadm to reach the certificate validity period of 10 years, which can build a highly available cluster

Service Classification

  • Stateful service: DBMS
  • Stateless service: LVS, APACHE

The etcd official positions it as a reliable distributed key-value storage service. It can store some key data for the entire distributed cluster and assist the normal operation of the distributed cluster.
etcd v2 will write the data into the memory
etcd v3 Local volume persistence operation will be introduced

ETCD is an important storage in the K8s cluster
Insert picture description here

Kublet, kube proxy, and container
kublet will interact with docker and operate docker to create the corresponding container. Kublet will maintain the life cycle of the pod.
Kube proxy can monitor load operations, operate firewalls, and implement pod mapping.

Highly available cluster replica data is best >=3

Main components:

  • API SERVER: unified entrance for all services
  • CrontrollerManager: Maintain the expected number of copies
  • Scheduler: Responsible for introducing characters and selecting appropriate nodes to assign tasks
  • ETCD: key-value pair database, storing all important information of the K8S cluster (persistence)
  • Kublet: Directly interact with the container engine to implement container lifecycle management
  • Kube-proxy: responsible for writing rules to IPTABLES, IPVS to achieve server mapping access

Other plug-in instructions

  • COREDNS: can create a domain name IP correspondence resolution for the SVC in the cluster
  • DASHBOARD: Provide a B/S structure access system for K8s cluster
  • INGRESS CONTROLLER: Officials can only implement four-layer proxy, and INGRESS can implement seven-layer proxy
  • FEDETATION: Provides a unified management function for multiple K8S across cluster centers
  • Prometheus: Provides monitoring capabilities for K8s clusters
  • ELK: Provide a unified analysis and intervention platform for K8s cluster logs

Pod concept

One Pod encapsulates multiple containers, and two Pods share Pause's network stack (meaning that two Pods do not have their own independent IP addresses).

Pause is the root container of the pod, and all containers in the pod share the pod's network stack and mount storage
Insert picture description here

 

Pod control type

Controller concept
There are many controllers (controllers) built in k8s, which are equivalent to a state machine to control the specific state and behavior of the Pod

Different classification of life cycle

  • Autonomous Pod: Pod exits, this type of Pod will not be created

  • Pod managed by the controller: During the life cycle of the controller, the number of copies of the Pod must always be maintained

ReplicationController&ReplicaSet&Deplovment

The ReplicationController (RC) is used to ensure that the number of replicas of the container application is always maintained at the user-defined number of replicas, that is, if a container exits abnormally, a new Pod will be automatically created to replace it; and if the abnormally extra container is also automatically recycled.
In the new version of Kubernetes, it is recommended to use ReplicaSet to replace ReplicationControlle

There is no essential difference between ReplicaSet and ReplicationController, except that the name is different, and ReplicaSet supports collective selectors (the pod will be labeled when it is created, but when you want to delete or modify it one day, RS supports this scheme. RC does not support it. )

Although ReplicaSet can be used independently, it is generally recommended to use Deployment to automatically manage ReplicaSet, so that there is no need to worry about incompatibility with other mechanisms (for example, ReplicaSet does not support rolling-update but Deployment supports)
(to achieve rolling update, rollback operation)

Deployment manages Pod through rs
Insert picture description here

★Deployment

Deployment provides a declarative method for Pod and ReplicaSet to replace the previous ReplicationController to facilitate application management. Typical application scenarios include:

  • Define Deployment to create Pod and ReplicaSet
  • Rolling upgrade and rollback application
  • Expand and shrink
  • Pause and resume deployment

Declarative example, such as mysql to create a database, no need to tell the specific steps:
mysql create database xx

Declarative (Delpoyment): apply (excellent) create
imperative (rs): create (excellent) apply

Deployment manages Pod through rs
Insert picture description here

 

Schematic diagram of rolling update and rollback operation (reduce the number of copies when deleting)
Insert picture description here
Insert picture description here

Insert picture description here

HPA(HorizontalPodAutoScale)

Horizontal Pod Autoscaling is only applicable to Deployment and ReplicaSet. In the V1 version, it only supports expansion based on the CPU utilization of the Pod. In the vlalpha version, it supports expansion and contraction based on memory and user-defined metrics
(to achieve horizontal automatic expansion)

StatefullSet

Docker mainly faces stateless services, which means that there is no corresponding storage that needs to be reserved in real time, such as Apache and LVS schedulers.

Stateful services, MySQL, MongoDB, need to update and store data in real time.

StatefulSet is to solve the problem of stateful services (corresponding to Deployments and ReplicaSets are designed for stateless services), and its application scenarios include:

  • Stable persistent storage, that is, Pod can still access the same persistent data after rescheduling, based on PVC
  • Stable network logo, that is, after the Pod is rescheduled, its PodName and HostName remain unchanged , based on Headless Service
    (that is, Service without ClusterIP)
  • Orderly deployment, orderly expansion, that is, Pods are in order. When deploying or expanding, they must be carried out in sequence according to the defined order ( that is, from 0 to N-1 , all previous Pods must be all before the next Pod runs) Running and Ready status), based on init containers
    (why not start stop readiness? The reason is that they need to change the image of the pod internal container, but for the init container, it does not need to be changed. You only need to add it before the pod container runs init C, will not change the original Pod structure)
  • Orderly shrink, orderly delete ( ie from N-1 to 0 )

DaemonSet

DaemonSet ensures that all (or some) Nodes run a copy of Pod . When a Node joins the cluster, a Pod will also be added for them. When a Node is removed from the cluster, these Pods will also be recycled. Deleting DaemonSet will delete all Pods it created

Some typical usages of DaemonSet:

  • Run cluster storage daemon, for example, run glusterd and ceph on each Node.
  • Run a log collection daemon on each Node, such as fluentd, logstash.
  • Run monitoring daemons on each Node, such as Prometheus Node Exporter. (Zabbix monitors every node)

Job,Cronjob

Contab has no script error correction capability.
Job has its own error correction capability. If the running script does not exit with a status code of 0, the program will be executed again.

Job is responsible for batch processing tasks, that is, tasks that are executed only once. It ensures that one or more Pods of the batch processing task successfully end
the life cycle of the Job is equal to the number of successful Pods.

Cron Job manages time-based jobs (Jobs are created cyclically at a specific time), namely:

  • Run only once at a given point in time
  • Run periodically at a given point in time, such as: database backup, sending email

to sum up:

  • Need to deploy stateless services to RS, Deployment
  • Need to use node as the node, deploy to DaemonSet
  • For batch processing tasks, deploy to Job and CronJob
  • Need a stateful service and deploy to StatefulSet
  • HPA is like an accessory of a controller. For example, deploy a Deployment first, and then create an HPA to manage the Deployment, which can realize automatic expansion and sending. For example, when the CPU is greater than 60%, it can be expanded to 10 nodes.
    (K8s does not support this by default. A resource collection plan is required to provide HPA with a performance indicator)

Docker mainly faces stateless services, which means that there is no corresponding storage that needs to be reserved in real time, such as Apache and LVS schedulers.
Stateful services, MySQL, MongoDB, need to update and store data in real time.

 

Service discovery

Service to collect Pod is obtained through label selection
Insert picture description here

Network communication mode

Network communication mode description

Network Communication Mode -. 1
Kubernetes network model assumes that all Pod are a can be accessed directly communicating flat cyberspace (all Pod can be reached directly by each other's IP), which is in the GCE (Google Compute Engine) inside A ready-made network model, Kubernetes assumes that this network already exists.
When building a Kubernetes cluster in a private cloud, you cannot assume that this network already exists. We need to implement this network assumption by ourselves, first open up the mutual access between Docker containers on different nodes, and then run Kubernetes

Network communication mode-2

  • Between multiple containers in the same Pod: lo, the container in the same Pod is accessed through the network card lo of the Pause container network stack
  • Communication between Pods: Overlay Network
  • Communication between Pod and Service: Iptables rules of each node

Network solution Kubernetes + Flannel

Flannel is a network planning service designed by the CoreOS team for Kubernetes. Simply put, its function is to allow Docker containers created by different node hosts in the cluster to have a unique virtual IP address in the entire cluster . And it can also establish an overlay network between these IP addresses, through this overlay network, the data packets are passed to the target container intact

backend is the access pool. The Destination in the figure is 192.168.66.12.
Flanneld opens the bridge Flannel0 to collect datagrams forwarded by Docker0.
Insert picture description here

ETCDFlannel provides instructions:

  • Store and manage the resources of the IP address segment
    that can be allocated by Flannel (that is, after Flannel is started, it will insert the allocated network segment into the ETCD, record the machine that allocates the network segment, and prevent the allocated network segment from being used by Flannel to cause IP conflict.)
  • Monitor the actual address of each Pod in ETCD, and establish and maintain a Pod node routing table in memory
    (that is, according to the Pod node routing table, the physical host corresponding to the Pod network segment 10.1.15.0/24 is 192.168.66.11)

Network communication method in different situations

Internal communication of the same Pod : The same Pod shares the same network namespace, shares the same Linux protocol stack, shares the Pause protocol stack, and communicates via the localhost loopback network card

Pod1 to Pod2

Pod1 and Pod2 are not on the same host. The address of Pod is on the same network segment as docker0, but the docker0 network segment and the host network card are two completely different IP network segments, and communication between different Nodes can only pass through the sink The physical network card of the host is performed. Associate the IP of the Pod with the IP of the Node where the Pod is located. Through this association, the Pod can access each other (the above picture actually represents the access between different hosts)
 
Pod1 and Pod2 are on the same machine, and the Docker0 bridge directly forwards the request To Pod2, no need to go through Flannel

Pod1 and Pod2 are on the same machine
Insert picture description here

Pod to Service network: currently based on performance considerations, all are maintained and forwarded by iptables or LVS

Pod to the external network: Pod sends a request to the external network , looks up the routing table, and forwards the data packet to the host's network card. After the host network card completes routing, iptables executes Masquerade, changes the source IP to the host network card's IP, and then external network Server sends request

MASQUERADE, address masquerading, is a special case of snat, which can realize automatic snat.

Access Pod from the Internet: Service

Component communication mode description

There is only one real network: node network
Virtual network: Pod network, Service network
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_39578545/article/details/106031623