Kubernetes入门(四)

版权声明:欢迎转载,转载请注明出处! https://blog.csdn.net/miss1181248983/article/details/87775915

30.5 Kubernetes集群搭建


准备4台虚拟机,以二进制方式,通过ansible脚本,自动化搭建一个多主多节点的高可用集群。

Ansible脚本安装kubernetes集群:https://github.com/gjmzj/kubeasz

kubernetes官方github地址:https://github.com/kubernetes/kubernetes/releases


环境准备

  • 软硬件限制:
1. CPU和内存 master:至少1c2g,推荐2c4g;node:至少1c2g

2. linux系统 内核版本3.10以上,推荐CentOS7或RHEL7

3. docker 至少1.9版本,推荐1.12+

4. etcd 至少2.0版本,推荐3.0+
  • 节点规划:
deploy节点---数量1:运行这份ansible脚本的节点

etcd节点---数量3:注意etcd集群必须是1,3,5,7...奇数个节点

master节点---数量2:根据实际集群规模可以增加节点数,需要额外规划一个master VIP

lb节点---数量2:负载均衡节点2个,安装haproxy和keepalived

node节点---数量3:真正应用负载的节点,根据需要提升机器配置和增加节点数
  • 机器规划:
ip 主机名 角色
192.168.30.128 master deploy、master1、lb1、etcd
192.168.30.129 node1 etcd、node
192.168.30.130 node2 etcd、node
192.168.30.150 master2 master2、lb2(master)
192.168.30.160 vip
  • 准备工作:

4台机器,全部执行

# yum install -y epel-release

# yum update -y

# yum install -y python

开始安装

  • deploy节点安装和准备ansible:
[root@master ~]# yum install git python-pip -y

[root@master ~]# pip install pip --upgrade -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

[root@master ~]# pip install --no-cache-dir ansible -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
  • deploy节点配置免密码登录:
[root@master ~]# ssh-keygen

[root@master ~]# for ip in 128 129 130 150 ; do ssh-copy-id 192.168.30.$ip; done

[root@master ~]# for ip in 128 129 130 150 ; do ssh 192.168.30.$ip; done                #登录验证
  • deploy节点上编排k8s:
[root@master ~]# git clone https://github.com/gjmzj/kubeasz.git

[root@master ~]# mkdir -p /etc/ansible

[root@master ~]# mv kubeasz/* /etc/ansible/
  • 下载二进制文件:

地址:百度云链接,根据自己所需版本,下载对应的tar包,下载解压到/etc/ansible/bin目录。这里我下载1.11版本。

[root@master ~]# tar zxf k8s.1-11-6.tar.gz

[root@master ~]# mv bin/* /etc/ansible/bin/
  • 配置集群参数:
[root@master ~]# cd /etc/ansible && cp example/hosts.m-masters.example hosts

[root@master ansible]# vim hosts                #根据实际情况修改

[root@master ansible]# ansible all -m ping              #验证是否节点正常
192.168.30.150 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
192.168.30.129 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
192.168.30.130 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
192.168.30.128 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
  • 创建证书和安装准备:
[root@master ansible]# ansible-playbook 01.prepare.yml
  • 安装etcd集群:
[root@master ansible]# ansible-playbook 02.etcd.yml

[root@master ansible]# bash             #如果提示etcdctl命令不存在则执行该步

[root@master ansible]# for ip in 128 129 130; do   
 ETCDCTL_API=3 etcdctl \
 --endpoints=https://192.168.30.$ip:2379 \
 --cacert=/etc/kubernetes/ssl/ca.pem \
 --cert=/etc/etcd/ssl/etcd.pem \
 --key=/etc/etcd/ssl/etcd-key.pem \
 endpoint health; done                  #验证集群状态

预期结果:

https://192.168.30.128:2379 is healthy: successfully committed proposal: took = 3.376468ms
https://192.168.30.129:2379 is healthy: successfully committed proposal: took = 2.518545ms
https://192.168.30.130:2379 is healthy: successfully committed proposal: took = 3.091893ms
  • 安装docker:
[root@master ansible]# ansible-playbook 03.docker.yml
  • 安装master节点:
[root@master ansible]# ansible-playbook 04.kube-master.yml
  • 安装node节点:
[root@master ansible]# ansible-playbook 05.kube-node.yml

查看node节点

[root@master ansible]# kubectl get node
NAME            STATUS                     ROLES     AGE       VERSION
192.168.30.129   Ready                      node      28s       v1.11.6
192.168.30.130   Ready                      node      39s       v1.11.6
192.168.30.128   Ready,SchedulingDisabled   master    4m        v1.11.6
192.168.30.150   Ready,SchedulingDisabled   master    4m        v1.11.6
  • 部署集群网络:
[root@master ansible]# ansible-playbook 06.network.yml

查看是否有flannel相关的pod

[root@master ansible]# kubectl get pod -n kube-system
NAME                          READY     STATUS    RESTARTS   AGE
kube-flannel-ds-amd64-5dfgb   1/1       Running   0          1m
kube-flannel-ds-amd64-6w22r   1/1       Running   0          1m
kube-flannel-ds-amd64-bn6xl   1/1       Running   0          1m
kube-flannel-ds-amd64-chg9q   1/1       Running   0          1m
  • 安装集群相关的插件(dns、dashboard):
[root@master ansible]# ansible-playbook 07.cluster-addon.yml

查看kube-system namespace下的服务

[root@master ansible]# kubectl get svc -n kube-system
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
kube-dns               ClusterIP   10.68.0.2       <none>        53/UDP,53/TCP,9153/TCP   1m
kubernetes-dashboard   NodePort    10.68.235.201   <none>        443:38679/TCP            1m
metrics-server         ClusterIP   10.68.241.153   <none>        443/TCP                  1m
  • 一步安装:

上面是分步骤安装的,如果不想这么麻烦,可以一步安装。

# ansible-playbook 90.setup.yml
  • 查看集群信息:
[root@master ansible]# kubectl cluster-info
Kubernetes master is running at https://192.168.30.160:8443
CoreDNS is running at https://192.168.30.160:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://192.168.30.160:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
  • 查看node/pod使用资源情况:
[root@master ansible]# kubectl top node
NAME            CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
192.168.30.129   95m          2%        1136Mi          5%        
192.168.30.130   92m          2%        1569Mi          9%        
192.168.30.128   174m         4%        2868Mi          14%       
192.168.30.150   121m         3%        1599Mi          8%
[root@master ansible]# kubectl top pod --all-namespaces
NAMESPACE     NAME                                    CPU(cores)   MEMORY(bytes)   
kube-system   coredns-695f96dcd5-7rsl5                5m           13Mi            
kube-system   coredns-695f96dcd5-v5zkg                5m           13Mi            
kube-system   kube-flannel-ds-amd64-5dfgb             4m           14Mi            
kube-system   kube-flannel-ds-amd64-6w22r             3m           12Mi            
kube-system   kube-flannel-ds-amd64-bn6xl             4m           13Mi            
kube-system   kube-flannel-ds-amd64-chg9q             4m           15Mi            
kube-system   kubernetes-dashboard-68bf55748d-qmgrh   1m           12Mi            
kube-system   metrics-server-75df6ff86f-qf56f         2m           14Mi 

测试DNS

  • 创建nginx service:
[root@master ansible]# kubectl run nginx --image=nginx --expose --port=80               # --export 暴露一个端口
service/nginx created
deployment.apps/nginx created

[root@master ansible]# kubectl get svc
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1      <none>        443/TCP   35m
nginx        ClusterIP   10.68.185.11   <none>        80/TCP    2s

[root@master ansible]# kubectl get pod
NAME                     READY     STATUS    RESTARTS   AGE
nginx-6f858d4d45-tfnhv   1/1       Running   0          27s
  • 创建busybox,测试pod:
[root@master ansible]# kubectl run busybox --rm -it --image=busybox sh            # --rm 退出时删除容器

/ # nslookup nginx.default.svc.cluster.local
Server:		10.68.0.2
Address:	10.68.0.2:53


*** Can't find nginx.default.svc.cluster.local: No answer

master2上面查看

[root@master2 ~]# kubectl get svc --all-namespaces
NAMESPACE     NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes             ClusterIP   10.68.0.1       <none>        443/TCP                  52m
default       nginx                  ClusterIP   10.68.185.11    <none>        80/TCP                   17m
kube-system   kube-dns               ClusterIP   10.68.0.2       <none>        53/UDP,53/TCP,9153/TCP   38m
kube-system   kubernetes-dashboard   NodePort    10.68.235.201   <none>        443:38679/TCP            38m
kube-system   metrics-server         ClusterIP   10.68.241.153   <none>        443/TCP                  38m

这里dns service的cluster ip就是刚刚查到的server ip 。


增加node节点

  • deploy节点免密码登录node:
# ssh-copy-id 新nodeip
  • 修改hosts:
# vim /etc/ansible/hosts
[new-node]
新nodeip
  • 执行安装脚本:
# ansible-playbook /etc/ansible/20.addnode.yml
  • 验证:
# kubectl get node

# kubectl get pod -n kube-system -o wide
  • 后续:

修改/etc/ansible/hosts,将[new-node]中的ip移到[kube-node]中。

增加master节点与增加node节点类似,此处省略。


升级集群

  • 备份etcd:
# ETCDCTL_API=3 etcdctl snapshot save backup.db

查看备份文件信息

# ETCDCTL_API=3 etcdctl --write-out=table snapshot status backup.db
  • 拉取新的代码:
# cd kubeasz/               #到本项目的根目录kubeasz

# git pull origin master
  • 下载升级目标版本的kubernetes二进制包:

百度网盘地址:https://pan.baidu.com/s/1c4RFaA#list/path=%2FK8S

解压并替换/etc/ansible/bin/下的二进制文件。

  • docker升级(略):

除非特别需要,否则不建议升级docker 。

  • 如果业务可以中断:
# ansible-playbook -t upgrade_k8s,restart_dockerd 22.upgrade.yml
  • 如果业务不能中断:
# ansible-playbook -t upgrade_k8s 22.upgrade.yml

到所有node上逐一操作

# kubectl cordon;kubectl drain              #迁移业务pod

# systemctl restart docker

# kubectl uncordon              #恢复pod

30.6 集群备份和恢复


kubernetes本身是高可用且负载均衡的,当发生意外情况时,只要数据还在,就可以很轻松的将其恢复。


备份恢复原理

  • 备份:

备份,从运行的etcd集群中备份数据到磁盘文件。

  • 恢复:

恢复,把etcd的备份文件恢复到etcd集群中,然后据此恢复整个集群。

  • 另外:

另外,如果是使用kubeasz项目创建的集群,除了备份etcd数据外,还需要备份CA证书文件,以及ansible的hosts文件。


操作步骤

  • 查看pod:
[root@master ~]# kubectl get pod
NAME                     READY     STATUS    RESTARTS   AGE
nginx-6f858d4d45-tfnhv   1/1       Running   0          18h
  • 创建备份目录:
[root@master ~]# mkdir -p /backup/k8s
  • 备份etcd数据:
[root@master ~]# ETCDCTL_API=3 etcdctl snapshot save /backup/k8s/snapshot.db
Snapshot saved at /backup/k8s/snapshot.db
  • 备份ca证书:
[root@master ~]# cp /etc/kubernetes/ssl/ca* /backup/k8s/

[root@master ~]# cp -r /backup/ /backup1/           #这一步不要省略,多备份一份
  • 模拟集群崩溃:
[root@master ~]# ansible-playbook /etc/ansible/99.clean.yml
  • 恢复CA证书:
[root@master ~]# cp -r /backup1/k8s/ /backup/

[root@master ~]# mkdir -p /etc/kubernetes/ssl

[root@master ~]# cp /backup/k8s/ca* /etc/kubernetes/ssl/
  • 重建集群:
[root@master ~]# cd /etc/ansible/

[root@master ansible]# ansible-playbook 01.prepare.yml

[root@master ansible]# ansible-playbook 02.etcd.yml

[root@master ansible]# ansible-playbook 03.docker.yml

[root@master ansible]# ansible-playbook 04.kube-master.yml

[root@master ansible]# ansible-playbook 05.kube-node.yml
  • 检查集群状态:
[root@master ansible]# kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1    <none>        443/TCP   1h

[root@master ansible]# kubectl cluster-info 
Kubernetes master is running at https://192.168.30.150:8443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

[root@master ansible]# for ip in 128 129 130; do ETCDCTL_API=3 etcdctl  --endpoints=https://192.168.30.$ip:2379  --cacert=/etc/kubernetes/ssl/ca.pem  --cert=/etc/etcd/ssl/etcd.pem  --key=/etc/etcd/ssl/etcd-key.pem endpoint health; done
https://192.168.30.128:2379 is healthy: successfully committed proposal: took = 2.147777ms
https://192.168.30.129:2379 is healthy: successfully committed proposal: took = 3.092867ms
https://192.168.30.130:2379 is healthy: successfully committed proposal: took = 2.992882ms

[root@master ansible]# kubectl get componentstatus
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok                   
controller-manager   Healthy   ok                   
etcd-0               Healthy   {"health": "true"}   
etcd-1               Healthy   {"health": "true"}   
etcd-2               Healthy   {"health": "true"} 
  • 恢复etcd数据:

停止服务

[root@master ansible]# ansible etcd -m service -a 'name=etcd state=stopped'

清空文件

[root@master ansible]# ansible etcd -m file -a 'name=/var/lib/etcd/member state=absent'

登录所有的etcd节点,参考本etcd节点/etc/systemd/system/etcd.service服务文件,执行

[root@master k8s]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd1 --initial-cluster etcd1=https://192.168.30.128:2380,etcd2=https://192.168.30.129:2380,etcd3=https://192.168.30.130:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.30.128:2380

[root@master k8s]# cp -r etcd1.etcd/member/ /var/lib/etcd/

[root@master k8s]# systemctl restart etcd
[root@master k8s]# rsync -av /backup/ 192.168.30.129:/backup/

[root@node1 ~]# cd /backup/k8s/                     # node1节点上执行

[root@node1 k8s]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd2 --initial-cluster etcd1=https://192.168.30.128:2380,etcd2=https://192.168.30.129:2380,etcd3=https://192.168.30.130:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.30.129:2380

[root@node1 k8s]# cp -r etcd2.etcd/member/ /var/lib/etcd/

[root@node1 k8s]# systemctl restart etcd
[root@master k8s]# rsync -av /backup/ 192.168.30.130:/backup/

[root@node2 ~]# cd /backup/k8s/                     # node2节点上执行

[root@node2 k8s]# ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --name etcd3 --initial-cluster etcd1=https://192.168.30.128:2380,etcd2=https://192.168.30.129:2380,etcd3=https://192.168.30.130:2380 --initial-cluster-token etcd-cluster-0 --initial-advertise-peer-urls https://192.168.30.130:2380

[root@node2 k8s]# cp -r etcd3.etcd/member/ /var/lib/etcd/

[root@node2 k8s]# systemctl restart etcd
  • 再次检查集群状态:
[root@master k8s]# kubectl get svc
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1      <none>        443/TCP   21h
nginx        ClusterIP   10.68.185.11   <none>        80/TCP    21h

[root@master k8s]# kubectl get pod
NAME                     READY     STATUS    RESTARTS   AGE
nginx-6f858d4d45-tfnhv   1/1       Running   0          21h

[root@master k8s]# kubectl get svc --all-namespaces
NAMESPACE     NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes             ClusterIP   10.68.0.1       <none>        443/TCP                  21h
default       nginx                  ClusterIP   10.68.185.11    <none>        80/TCP                   21h
kube-system   kube-dns               ClusterIP   10.68.0.2       <none>        53/UDP,53/TCP,9153/TCP   21h
kube-system   kubernetes-dashboard   NodePort    10.68.235.201   <none>        443:38679/TCP            21h
kube-system   metrics-server         ClusterIP   10.68.241.153   <none>        443/TCP                  21h

service和pod以及dns等都已经恢复回来。

  • 重建网络:
[root@master k8s]# ansible-playbook /etc/ansible/tools/change_k8s_network.yml

一键备份和恢复

如果不想手动恢复,可以选择这样。

  • 一键备份:
# ansible-playbook /etc/ansible/23.backup.yml
  • 检查文件:
# tree /etc/ansible/roles/cluster-backup/files/                 #如下
/etc/ansible/roles/cluster-backup/files/
├── ca                              #集群CA相关备份
│   ├── ca-config.json
│   ├── ca.csr
│   ├── ca-csr.json
│   ├── ca-key.pem
│   └── ca.pem
├── hosts                           # ansible hosts备份
│   ├── hosts                       #最近的备份
│   └── hosts-201902201420
├── readme.md
└── snapshot                        # etcd数据备份
    ├── snapshot-201902201420.db
    └── snapshot.db                 #最近的备份
  • 模拟故障:
# ansible-playbook /etc/ansible/99.clean.yml
  • 修改文件:

修改文件/etc/ansible/roles/cluster-restore/defaults/main.yml,指定要恢复的etcd快照备份,如果不修改就是最近一次。

  • 一键恢复:
# ansible-playbook /etc/ansible/24.restore.yml

# ansible-playbook /etc/ansible/tools/change_k8s_network.yml

更多参考资料:

使用kubeadm安装Kubernetes 1.11

猜你喜欢

转载自blog.csdn.net/miss1181248983/article/details/87775915