Kubernetes high-availability cluster deployment practices

Foreword

Kubernetes (k8s) by virtue of its excellent infrastructure, flexible expansion capabilities, rich orchestration application model, has become the de facto standard in the field of container arrangement. More and more companies embrace this trend, select k8s as an infrastructure container application, and gradually migrate their core services to over k8s.

Availability is critical for infrastructure. The major cloud computing vendors have introduced a highly available, scalable k8s hosting services, there are more representative of the Amazon EKS , Azure Kubernetes Service (AKS) , Google Kubernetes Engine , Ali cloud container services Kubernetes version and so on.

While public cloud services hosted k8s flourishing, but many companies still demand self-built cluster. It is for this reason, to promote the birth of a large number of outstanding k8s cluster deployment scenarios, their characteristics are listed below.

Deployment Scenarios Feature
Kubeadm 1. The official produced deployment tool that provides k8s cluster life cycle management of domain knowledge.
2. The tool is intended to be a higher-level building blocks can be combined.
Kubespray 1. Support k8s deployed on bare metal and AWS, GCE, Azure and other cloud platforms.
2. Define k8s cluster deployment task based Ansible Playbook.
3. Support the most popular Linux distributions.
Kops 1. only supports the deployment k8s on AWS, GCE and a few other cloud platforms.
2. synchronization state based on the model, and for dry-run automatic idempotent.
3. Terraform can automatically generate configuration.
Rancher Kubernetes Engine (RKE) 1. The well-known open source enterprise management platform container Rancher lightweight k8s provide installation tool.
2. Support deployed on bare metal, virtual machine, and public cloud management k8s clusters.

In the above solution, RKE an advantage in ease of use and flexibility. This article teaches how to deploy a high-availability cluster k8s by RKE, RKE version as used herein v0.2.2.

High-availability cluster architecture k8s

First need to understand the characteristics of high availability architecture k8s cluster, the next figure is the official recommendation of the high-availability cluster architecture diagram.

k8s_arch

The core idea is to make k8s master node high availability of various types of components, eliminating single points of failure.

  • apiserver-Kube - Foreign exposed k8s API, is to visit the entrance of the entire cluster. Since apiserver stateless itself, and can start multiple instances of high availability in conjunction with the load balancer.
  • ETCD - Network Cluster k8s for storing configuration and status information of the object, the entire data center cluster. It may be established by activating a redundant instance of an odd number etcd, reliable data storage layer.
  • Scheduler-Kube - POD select the new node created for them a run for. A cluster can only have one active kube-scheduler instance, you can start multiple kube-scheduler at the same time and using the leader election functions to achieve high availability.
  • the Controller-Manager-Kube - internal cluster management control center. A cluster can only have one active kube-controller-manager instance, you can start multiple simultaneous kube-controller-manager and a leader in the use of electoral functions to achieve high availability.

In addition, Shihai build cluster to note the following questions.

  • Reliability k8s processes on the node. Need to kubelet, kube-scheduler, kube-controller-manager and other processes can automatically restart after a failure.
  • Reserve resources for non-pod worker node in the process, prevent them from competing for resources with the pod will lead to a shortage of resources node.

RKE build a highly available cluster using k8s

Node Planning

The first step is to build a cluster of servers have divided according to node function, the following table shows the planning of the nodes in the author environment.

IP Character
192.168.0.10 Node deployment
192.168.0.11 k8s master - api-server, etcd, scheduler, controller-manager
192.168.0.12 k8s master - api-server, etcd, scheduler, controller-manager
192.168.0.13 k8s master - api-server, etcd, scheduler, controller-manager
192.168.0.14 k8s worker - omelets, a proxy
192.168.0.15 k8s worker - omelets, a proxy
192.168.0.16 k8s worker - omelets, a proxy
192.168.0.17 k8s worker - omelets, a proxy

Plan Description:

  1. A machine individually selected 192.168.0.10as nodes deployed. If the small number of machines, the deployment of nodes can be added to k8s cluster.
  2. In order to ensure the availability, select the three machines deployed k8s master components. If the conditions can be etcd other components in master and deployed separately, so that more flexibility in controlling the number of instances required. For example, but not under pressure access requests for data reliability is relatively high, it may instead choose to deploy 5 ETCD machines, additional machines deployed 3 select the other components in master.
  3. The remaining four machines as k8s worker nodes. The number of nodes dynamically adjusted according to actual situation. When a pod due to lack of resources has been in a pending state, it may be the expansion of the worker. When node resource utilization is low, and the POD is present on this node can be rescheduled to run on another node, it may be worker volume reduction.

Preparing the Environment

Upon completion of the node plan, the need for environmental preparatory work, mainly includes the following:

  1. 安装 RKE - 需要在部署节点(192.168.0.10)上安装 RKE 二进制包,具体安装方法可参考 download-the-rke-binary
  2. 配置 SSH 免密登录 - 由于 RKE 通过 SSH tunnel 安装部署 k8s 集群,需要配置 RKE 所在节点到 k8s 各节点的 SSH 免密登录。如果 RKE 所在节点也需要加入到 k8s 集群中,需要配置到本机的 SSH 免密登录。
  3. 安装 docker - 由于 RKE 通过 docker 镜像rancher/hyperkube启动 k8s 组件,因此需要在 k8s 集群的各个节点(192.168.0.11 ~ 192.168.0.17 这 7 台机器)上安装 docker。
  4. 关闭 swap - k8s 1.8 开始要求关闭系统的 swap,如果不关闭,默认配置下 kubelet 将无法启动。这里需要关闭所有 k8s worker 节点的 swap。

配置 cluster.yml

在完成环境准备后,需要通过 cluster.yml 描述集群的组成和 k8s 的部署方式。

配置集群组成

配置文件 cluster.yml 中的 nodes 配置项用于描述集群的组成。根据节点规划,对于 k8s master 节点,指定其角色为controlplaneetcd。对于 k8s worker 节点,指定其角色为worker

nodes:
  - address: 192.168.0.1
    user: admin
    role:
      - controlplane
      - etcd
  ...
  - address: 192.168.0.7
    user: admin
    role:
      - worker

设置资源预留

K8s 的 worker node 除了运行 pod 类进程外,还会运行很多其他的重要进程,包括 k8s 管理进程,如 kubelet、dockerd,以及系统进程,如 systemd。这些进程对整个集群的稳定性至关重要,因此需要为他们专门预留一定的资源。

笔者环境中的 worker 设置如下:

  • 节点拥有 32 核 CPU,64Gi 内存和 100Gi 存储。
  • 为 k8s 管理进程预留了 1 核 CPU,2Gi 内存和 1Gi 存储。
  • 为系统进程预留了 1 核 CPU,1Gi 内存和 1Gi 存储。
  • 为内存资源设置了 500Mi 的驱逐阈值,为磁盘资源设置了 10% 的驱逐阈值。

在此场景下,节点可分配的 CPU 资源是 29 核,可分配的内存资源是 60.5Gi,可分配的磁盘资源是 88Gi。对于不可压缩资源,当 pod 的内存使用总量超过 60.5Gi 或者磁盘使用总量超过 88Gi 时,QoS 较低的 pod 将被优先驱逐。对于可压缩资源,如果节点上的所有进程都尽可能多的使用 CPU,则 pod 类进程加起来不会使用超过 29 核的 CPU 资源。

上述资源预留设置在 cluster.yml 中具体形式如下。

services:
  kubelet:
    extra_args:
      cgroups-per-qos: True
      cgroup-driver: cgroupfs
      kube-reserved: cpu=1,memory=2Gi,ephemeral-storage=1Gi
      kube-reserved-cgroup: /runtime.service
      system-reserved: cpu=1,memory=1Gi,ephemeral-storage=1Gi
      system-reserved-cgroup: /system.slice
      enforce-node-allocatable: pods,kube-reserved,system-reserved
      eviction-hard: memory.available<500Mi,nodefs.available<10%

关于资源预留更详细的内容可参考文章 Reserve Compute Resources for System Daemons

部署 k8s 集群

当 cluster.yml 文件配置完成后,可以通过命令rke up完成集群的部署任务。下图展示了通过 RKE 部署的 k8s 集群架构图。

rke_arch

该架构有如下特点:

  1. 集群中的各个组件均通过容器方式启动,并且设置重启策略为always。这样当他们出现故障意外退出后,能被自动拉起。
  2. Master 节点上的 kube-scheduler、kube-controller-manager 直接和本机的 API server 通信。
  3. Worker 节点上的 nginx-proxy 拥有 API server 的地址列表,负责代理 kubelet、kube-proxy 发往 API server 的请求。
  4. 为了让集群具有灾备能力,master 节点上的 etcd-rolling-snapshots 会定期保存 etcd 的快照至本地目录/opt/rke/etcd-snapshots中。

配置负载均衡器

After completing the cluster deployment, you can access k8s by API server. A plurality of start due to environmental kube-apiserver instance for high availability, it is necessary to set up a load balancer to these examples. Here in 192.168.0.10the deployed nginx achieve a load balancing function, nginx.conf specific configuration is as follows.

...
stream {
    upstream apiserver {
        server 192.168.0.11:6443 weight=5 max_fails=3 fail_timeout=60s;
        server 192.168.0.12:6443 weight=5 max_fails=3 fail_timeout=60s;
        server 192.168.0.13:6443 weight=5 max_fails=3 fail_timeout=60s;
    }

    server {
        listen 6443;
        proxy_connect_timeout 1s;
        proxy_timeout 10s;
        proxy_pass apiserver;
    }
}
...

In this case, provided by the load balancer port access API server anomalies occur Unable to connect to the server: x509: certificate is valid for xxx, not 192.168.0.10. It should be the domain name or IP address of the load balancer added to the API server of the PKI certificate, you can configure this feature by cluster.yml item to complete the authentication.

authentication:
  strategy: x509
  sans:
    - "192.168.0.10"

After modifying cluster.yml, run the command rke cert-rotate.

verification

After completing all the above steps, by the command kubectl get nodesto view the status of the node. If the status of all nodes are Ready, said cluster deployment success.

NAME            STATUS    ROLES              AGE    VERSION
192.168.0.11    Ready     controlplane,etcd  1d     v1.13.5
192.168.0.12    Ready     controlplane,etcd  1d     v1.13.5
192.168.0.13    Ready     controlplane,etcd  1d     v1.13.5
192.168.0.14    Ready     worker             1d     v1.13.5
192.168.0.15    Ready     worker             1d     v1.13.5
192.168.0.16    Ready     worker             1d     v1.13.5
192.168.0.17    Ready     worker             1d     v1.13.5

to sum up

Rancher Kubernetes Engine (RKE) shielding the user clusters created k8s intricate detail, simplifying deployment step, constructing the reduced threshold. For those who have self-built k8s cluster needs of enterprises is a good choice.

Reference material

Guess you like

Origin yq.aliyun.com/articles/704946