Kubernetes 最新版本安装过程和注意事项

本文写于 2019-02-06 已亥猪年农历正月初二
当前最新版本为 v1.13.3

在 18 年 6 月份京东活动的时候，买了一本 Kubernetes 权威指南，一直没时间看，春节期间正好学学。由于书上使用的是 2017 年的 1.6.0 版本，我自己为了使用最新版本，特地做一个记录。

虽然买了本书，但是整个操作过程参考了很多资料，主要是 kubeadm 官方文档：

系统防火墙配置
禁用防火墙
systemctl disable firewalld
关闭防火墙
systemctl stop firewalld
禁用 SELinux，目的是让容器可以读取主机文件系统
setenforce 0
配置禁用 SELinux
vi /etc/sysconfig/selinux
修改 SELINUX 为 disabled
SELINUX=disabled
#SELINUX=enforcing

在上述第一个文档中安装 kubeadm，kubelet 和 kubectl 这一步时，文档提供的脚本中的地址都是https://packages.cloud.google.com 域名下的，由于被墙无法使用，我们可以使用阿里巴巴开源镜像站 提供的 kubernetes。

1. 安装 kubelet kubeadm kubectl

使用阿里巴巴开源镜像站

https://opsx.alibaba.com/mirror

从列表找到 kubernetes，点击帮助，显示如下信息。

我使用的最新版本的 CentOS 7：VMware 虚拟机最小化安装 CentOS 7 的 IP 配置

Debian / Ubuntu

apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF  
apt-get update
apt-get install -y kubelet kubeadm kubectl

CentOS / RHEL / Fedora

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
setenforce 0
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet

Docker CE

在阿里巴巴开源镜像站还可以搜 docker-ce，帮助中给了一个地址：

https://yq.aliyun.com/articles/110806

特别注意：本文的 Kubernetes 版本为 v1.13.3，因为我使用 docker 官方脚本安装的最新版本的，所以执行 kubeadm init 时有下面的警告：

[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06

因此使用 18.06 版本可以消除该警告，如果要指定版本，上面文档中也有说明。

2. Docker 加速

由于国内下载 docker image 速度较慢，可以使用 DaoCloud 提供的加速功能，地址如下：

https://www.daocloud.io/mirror

在该页面下面的配置 Docker 加速器部分有命令可以直接执行（很久以前还得注册，现在不需要了）。

3. 执行 `kubeadm init`

执行该命令时会出现很多问题，这里都列举出来。

执行 kubeadm init 时，会先请求 https://dl.k8s.io/release/stable-1.txt 获取最新稳定的版本号，该地址实际会跳转到 https://storage.googleapis.com/kubernetes-release/release/stable-1.txt ，在写本文时此时的返回值为 v1.13.3。由于被墙无法请求该地址，为了避免这个问题，我们可以直接指定要获取的版本，执行下面的命令：

kubeadm init --kubernetes-version=v1.13.3

执行该命令时可能会遇到下面的错误。

3.1 ERROR FileContent–proc-sys-net-bridge-bridge-nf-call-iptables

问题：

[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1

解决方案：

echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables

3.2 ERROR Swap

问题：

[ERROR Swap]: running with swap on is not supported. Please disable swap

解决方案，禁用 swap 分区：

#禁用当前的 swap
sudo swapoff -a 
#同时永久禁掉swap分区，打开如下文件注释掉swap那一行
sudo vi /etc/fstab

3.3 无法下载镜像

问题（有删减）：

error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.13.3
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.13.3
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.13.3
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.13.3
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.1
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.2.24
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.2.6

由于 gcr.io 被墙无法下载，我们可以先通过其他渠道下载，然后在继续执行命令。

4. 预先下载镜像

根据前面错误信息来看我们需要下载的镜像。就当前来说，用户 mirrorgooglecontainers 在 docker hub 同步了所有 k8s 最新的镜像，先从这儿下载，然后修改 tag 即可。

docker pull mirrorgooglecontainers/kube-apiserver:v1.13.3
docker pull mirrorgooglecontainers/kube-controller-manager:v1.13.3
docker pull mirrorgooglecontainers/kube-scheduler:v1.13.3
docker pull mirrorgooglecontainers/kube-proxy:v1.13.3
docker pull mirrorgooglecontainers/pause:3.1
docker pull mirrorgooglecontainers/etcd:3.2.24
docker pull coredns/coredns:1.2.6

下载完成后，通过 docker images 查看如下：

REPOSITORY                                       TAG                 IMAGE ID            CREATED             SIZE
mirrorgooglecontainers/kube-apiserver            v1.13.3             fe242e556a99        5 days ago          181MB
mirrorgooglecontainers/kube-controller-manager   v1.13.3             0482f6400933        5 days ago          146MB
mirrorgooglecontainers/kube-proxy                v1.13.3             98db19758ad4        5 days ago          80.3MB
mirrorgooglecontainers/kube-scheduler            v1.13.3             3a6f709e97a0        5 days ago          79.6MB
coredns/coredns                                  1.2.6               f59dcacceff4        3 months ago        40MB
mirrorgooglecontainers/etcd                      3.2.24              3cab8e1b9802        4 months ago        220MB
mirrorgooglecontainers/pause                     3.1                 da86e6ba6ca1        13 months ago       742kB

分别修改上述镜像的标签。

docker tag mirrorgooglecontainers/kube-apiserver:v1.13.3 k8s.gcr.io/kube-apiserver:v1.13.3
docker tag mirrorgooglecontainers/kube-controller-manager:v1.13.3 k8s.gcr.io/kube-controller-manager:v1.13.3
docker tag mirrorgooglecontainers/kube-scheduler:v1.13.3 k8s.gcr.io/kube-scheduler:v1.13.3
docker tag mirrorgooglecontainers/kube-proxy:v1.13.3 k8s.gcr.io/kube-proxy:v1.13.3
docker tag mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag mirrorgooglecontainers/etcd:3.2.24 k8s.gcr.io/etcd:3.2.24
docker tag coredns/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6

配置好镜像后继续执行前面的命令：

kubeadm init --kubernetes-version=v1.13.3

5. 安装成功

前面命令执行输出的日志如下（保存好这段日志）：

[root@k8s-master ~]# kubeadm init --kubernetes-version=v1.13.3
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.200.131 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.200.131 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.200.131]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 21.507393 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-master" as an annotation
[mark-control-plane] Marking the node k8s-master as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 7j01ut.pbdh60q732m1kd4v
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.200.131:6443 --token 7j01ut.pbdh60q732m1kd4v --discovery-token-ca-cert-hash sha256:de1dc033ae5cc27607b0f271655dd884c0bf6efb458957133dd9f50681fa2723

6. 特别注意上面提示的后续操作（一）

上面要求执行下面的操作：

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

有了上面配置后，后续才能使用 kubectl 执行命令。

如果你系统重启过，执行 kubectl 时你可能会遇到类似下面这样的问题：

[root@k8s-master ~]# kubectl get pods
The connection to the server 192.168.200.131:6443 was refused - did you specify the right host or port?

我不清楚这里具体的原因，但是找到了问题的根源就是 swap，在前面 3.2 中，如果没有彻底禁用 swap，重启后会仍然启用，此时的 k8s 就会出现上面的错误。因为这个原因，所以建议直接禁用：

#永久禁掉swap分区，打开如下文件注释掉swap那一行
sudo vi /etc/fstab

7. 特别注意上面提示的后续操作（二）

上面要求执行下面的操作：

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

参考Kubernetes 权威指南（书上用的 v1.6），这里安装 Weave 插件，文档地址：

https://www.weave.works/docs/net/latest/kubernetes/kube-addon/

按照文档，执行下面的命令：

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

输出的内容如下：

serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.extensions/weave-net created

8. 特别注意上面提示的后续操作（三）

上面要求在NODE节点的主机上，以 root 用户执行下面的操作，

kubeadm join 192.168.200.131:6443 --token 7j01ut.pbdh60q732m1kd4v --discovery-token-ca-cert-hash sha256:de1dc033ae5cc27607b0f271655dd884c0bf6efb458957133dd9f50681fa2723

注意，别复制这里的，看你自己上面安装后输出的内容。
这段代码中的 token 可能存在有效期 1 天，如果过期，可以参考下面地址获取最新的 token
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#join-nodes

如果你使用的多个主机组建的集群，可以在其他主机执行上面的命令加入到集群中。

因为我这儿是实验用，所以我用的单机集群方式。

9. 单机集群

参考官方文档：https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

先执行下面的命令查看当前 pods 状态：

[root@k8s-master ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   coredns-86c58d9df4-9s65p             0/1     Error     0          59m
kube-system   coredns-86c58d9df4-dvg7b             0/1     Error     0          59m
kube-system   etcd-k8s-master                      1/1     Running   3          58m
kube-system   kube-apiserver-k8s-master            1/1     Running   3          58m
kube-system   kube-controller-manager-k8s-master   1/1     Running   3          58m
kube-system   kube-proxy-5p4d8                     1/1     Running   3          59m
kube-system   kube-scheduler-k8s-master            1/1     Running   3          58m
kube-system   weave-net-j87km                      1/2     Running   2          16m

此时看到两个 Error，不清楚原因，然后执行单机集群的命令：

kubectl taint nodes --all node-role.kubernetes.io/master-

再次查看状态：

[root@k8s-master ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   coredns-86c58d9df4-9s65p             1/1     Running   1          60m
kube-system   coredns-86c58d9df4-dvg7b             1/1     Running   1          60m
kube-system   etcd-k8s-master                      1/1     Running   3          59m
kube-system   kube-apiserver-k8s-master            1/1     Running   3          59m
kube-system   kube-controller-manager-k8s-master   1/1     Running   3          59m
kube-system   kube-proxy-5p4d8                     1/1     Running   3          60m
kube-system   kube-scheduler-k8s-master            1/1     Running   3          59m
kube-system   weave-net-j87km                      2/2     Running   3          16m

现在都正常了。

退出系统，创建快照（在本文操作过程中，我创建了 3 个不同阶段的快照）。

10. 虚拟机备份下载

为了方便自己和他人使用，提供一个虚拟机的备份方便直接使用（如果安装遇到各种莫名其妙错误想要直接跳过安装的）。

虚拟机版本：15.0.0 build-10134415
虚拟机备份链接: https://pan.baidu.com/s/1s3FZtcvONgFXAmz1AUU9_w
提取码: tbi2

系统登陆用户：root
系统登陆密码：jj
有关虚拟机的 IP 信息看这里：VMware 虚拟机最小化安装 CentOS 7 的 IP 配置

如果想要修改 IP 应该怎么做？

首先改 IP，重启 network 服务。
重置 k8s，执行 kubeadm reset。
执行安装 kubeadm init --kubernetes-version=v1.13.3。
你可能还会遇到 3.1 中的问题，按照上面配置即可。
先删除6中创建的 $HOME/.kube 目录，执行 rm -rf $HOME/.kube。
执行 6. 特别注意上面提示的后续操作（一） 中的操作。
执行7. 特别注意上面提示的后续操作（二） 中的操作。
如果是单机集群方式，还需执行kubectl taint nodes --all node-role.kubernetes.io/master-。
好了，又满血复活了，执行 kubectl get pods --all-namespaces 查看状态。

11. 小结

本文是一边写一边操作验证写完的，写完的时候，我自己的配置都没问题了，从头下来虽然花的时间比较长，但是还算顺利，本文只是配置了 Kubernetes 的实验环境，一切才刚刚开始！

12. 补充常见问题

根据我自己后续操作遇到的问题，都列举在此。

12.1 Pod 一直是 Pending

例如：

[root@k8s-master chapter01]# kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
mysql-4wvzz   0/1     Pending   0          4m1s

此时可以通过下面命令查看 Pod 状态：

[root@k8s-master chapter01]# kubectl describe pods mysql-4wvzz
Name:               mysql-4wvzz
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=mysql
Annotations:        <none>
Status:             Pending
IP:                 
Controlled By:      ReplicationController/mysql
Containers:
  mysql:
    Image:      mysql
    Port:       3306/TCP
    Host Port:  0/TCP
    Environment:
      MYSQL_ROOT_PASSWORD:  123456
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-rksdn (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-rksdn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-rksdn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  28s (x3 over 103s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

此时发现一个警告：0/1 nodes are available: 1 node(s) had taints that the pod didn’t tolerate.

该问题是因为没有可用的节点NODE，所以无法部署，如果你使用的单机集群方式，你可能是忘了执行下面的命令：

kubectl taint nodes --all node-role.kubernetes.io/master-

如果是多机集群，通过 kubectl get nodes 查看节点状态，保证有可用节点。

问题解决后，再次查看 Pod 状态，此时的事件部分输出如下：

Events:
  Type     Reason            Age                 From                 Message
  ----     ------            ----                ----                 -------
  Warning  FailedScheduling  51s (x9 over 6m6s)  default-scheduler    0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         10s                 default-scheduler    Successfully assigned default/mysql-4wvzz to k8s-master
  Normal   Pulling           9s                  kubelet, k8s-master  pulling image "mysql"
  Normal   Pulled            7s                  kubelet, k8s-master  Successfully pulled image "mysql"
  Normal   Created           7s                  kubelet, k8s-master  Created container
  Normal   Started           7s                  kubelet, k8s-master  Started container

此时问题已经解决。