博哥爱运维教程&视频

文章目录


视频教程下载:

链接:https://pan.baidu.com/s/1rAMDFPwda4Pl3wO2DsGh1w

提取码:txpy

第1关 K8s一窥真容

首先来一张简结版的K8s架构图

第1关 K8s一窥真容

接着来一张详细的K8s架构图

第1关 K8s一窥真容

从上面的图可以看出整个K8s集群分为两大部分:

  • K8s控制平面
  • (工作)节点

让我们具体看下这两个部分做了什么,以及其内部运行的内容又是什么。

控制平面的组件

控制平面负责控制并使得整个K8s集群正常运转。 回顾一下,控制平面包含如下组件:

  • ETCD分布式持久化存储 – etcd保存了整个K8s集群的状态;
  • API服务器 – apiserver提供了资源操作的唯一入口,并提供认证、授权、访问控制、API注册和发现等机制;
  • 调度器 – scheduler负责资源的调度,按照预定的调度策略将Pod调度到相应的机器上;
  • 控制器管理器 – controller manager负责维护集群的状态,比如故障检测、自动扩展、滚动更新等;

这些组件用来存储、管理集群状态,但它们不是运行应用的容器。

工作节点上运行的组件

运行容器的任务依赖于每个工作节点上运行的组件:

  • Kubelet – 是 Node 的 agent,负责维护容器的生命周期,同时也负责Volume(CSI)和网络(CNI)的管理;
  • Kubelet服务代理(kube-proxy) – 负责为Service提供cluster内部的服务发现和负载均衡;
  • 容器运行时(Docker、rkt或者其他) – Container runtime负责镜像管理以及Pod和容器的真正运行(CRI);

附加组件

除了控制平面(和运行在节点上的组件),还要有几个附加组件,这样才能提供所有之前讨论的功能。包含:

  • K8s DNS服务器 – CoreDNS负责为整个集群提供DNS服务
  • 仪表板(可选) – Dashboard提供GUI,作为高级运维人员,使用kubectl命令行工具管理足矣
  • Ingress控制器 – Ingress Controller为服务提供外网流量入口
  • 容器集群监控 – Metrics-server为K8s资源指标获取工具; Prometheus提供资源监控
  • CNI容器网络接口插件 – calico, flannel(如果没有实施网络策略的需求,那么就直接用flannel,开箱即用;否则就用calico了,但要注意如果网络使用了巨型帧,那么注意calico配置里面的默认值是1440,需要根据实际情况进行修改才能达到最佳性能)

简单概括

API服务器只做了存储资源到etcd和通知客户端有变更的工作。 调度器则只是给pod分配节点(由kubelet来启动容器) 控制管理器里的控制器始终保持活跃的状态,来确保系统真实状态朝API服务器定义的期望的状态收敛

Deployment资源提交到API服务器的事件链

准备包含Deployment清单的YAML文件,通过kubectl提交到Kubernetes。kubectl通过HTTP POST请求发送清单到Kubernetes API服务器。API服务器检查Deployment定义,存储到etcd,返回响应给kubectl,如下图所示:

第1关 K8s一窥真容

第2关 部署安装包及系统环境准备

下面是相关软件安装包及系统镜像下载地址

# VMware Workstation15
https://www.52pojie.cn/forum.php?mod=viewthread&tid=1027984&highlight=vmware%2B15.5.0

# CentOS-7.9-2009-x86_64-Minimal
https://mirrors.aliyun.com/centos/7.9.2009/isos/x86_64/CentOS-7-x86_64-Minimal-2009.iso

安装centos7这块不算很复杂,作为想学习k8s的同学,是有必要打好linux系统这些基本功的,相关安装教程百度下也会有很多,我这里就不再重复写相关安装教程了。

这里我就先啰嗦两句… 看了下现在市面上很多k8s相关的视频教程,光说将安装就占去整个教程一半以上的时间,剩下真正生产实战的时间寥寥无几。 当然我这里并不是说这种方式有什么大问题,我只是根据我自己的快速学习及生产实践来给大家做下分享,希望的是大家少走,能更快速的在工作生产中上手使用k8s。 k8s的安装,我的工作生产实践经验是选取开源的二进制包的形式来安装,正所谓工欲善其事必先利其器,我们先用成熟的工具把符合生产标准的k8s集群给部署起来,边实战边理解k8s各个组成部分的原理,这样会达到事半功倍的效果,并且现在实际情况是各种云平台都推出了自家的k8s托管服务,你连搭建都不需要了,直接买机器它就帮你部署好了,直接用就行。这也好比你想开车,不一定非得自己先把车的所有组件及运行原理、还有维修手段都掌握了再买辆车开吧,估计人都没兴趣去开车了。真实生活中,大家大部分都是拿了驾照就直接去买车,开起来体验再说,在开的过程中,慢慢学会了一些汽车的保养知识。 然后开始讲解工具安装步骤。。。

为什么要学习K8s呢? k8s是容器编排管理平台,满足了大量使用docker容器的一切弊端,如果还非要说出为什么要学习掌握k8s,我只能说未来几年,k8s是基本所有互联网企业的技术平台会使用的技术,不会就只能被淘汰 或者拿不到自己满意的高薪。

第3关 二进制高可用安装k8s生产级集群

下面是这次安装k8s集群相关系统及组件的详细版本号

  • CentOS Linux release 7.9.2009 (Core)
  • k8s: v.1.20.2
  • docker: 19.03.14
  • etcd: v3.4.13
  • coredns: v1.7.1
  • cni-plugins: v0.8.7
  • calico: v3.15.3

下面是此次虚拟机集群安装前的IP等信息规划,这里大家就按我教程里面的信息来做,等第一遍跑通了后,后面可以按照自己的需求改变IP信息,这时候大家就会比较顺利了

IP hostname role
10.0.1.201【100.50】 node-1【master】 master/work node
10.0.1.202 node-2 master/work node
10.0.1.203【100.60】 node-3【node】 work node
10.0.1.204 node-4 work node

显然目前为止,前面几关给我们的装备还不太够,我们继续在这一关获取充足的装备弹药,为最终战胜K8s而奋斗吧!

这里采用开源项目https://github.com/easzlab/kubeasz,以二进制安装的方式,此方式安装支持系统有CentOS/RedHat 7, Debian 9/10, Ubuntu 1604/1804。

部署网络架构图

第3关 二进制高可用安装k8s生产级集群

安装步骤清单:

  1. deploy机器做好对所有k8s node节点的免密登陆操作
  2. deploy机器安装好python2版本以及pip,然后安装ansible
  3. 对k8s集群配置做一些定制化配置并开始部署

对于这个开源项目,我自己编写了一个shell脚本对其进行了一层封装,说简单点就是想偷点懒o,这里我就以这个脚本来讲解整个安装的步骤:

将下面脚本内容复杂到k8s_install_new.sh脚本内准备执行安装

如果在这里面不好复杂的话,可以直接到我的github仓库里面下载这个脚本,地址:
https://github.com/bogeit/LearnK8s/blob/main/k8s_install_new.sh

#!/bin/bash
# auther: boge
# descriptions:  the shell scripts will use ansible to deploy K8S at binary for siample

# 传参检测
[ $# -ne 6 ] && echo -e "Usage: $0 rootpasswd netnum nethosts cri cni k8s-cluster-name\nExample: bash $0 bogedevops 10.0.1 201\ 202\ 203\ 204 [containerd|docker] [calico|flannel] test\n" && exit 11 

# 变量定义
export release=3.0.0
export k8s_ver=v.1.20.2  # v1.20.2, v.1.19.7, v1.18.15, v1.17.17
rootpasswd=$1
netnum=$2
nethosts=$3
cri=$4
cni=$5
clustername=$6
if ls -1v ./kubeasz*.tar.gz &>/dev/null;then software_packet="$(ls -1v ./kubeasz*.tar.gz )";else software_packet="";fi
pwd="/etc/kubeasz"


# deploy机器升级软件库
if cat /etc/redhat-release &>/dev/null;then
    yum update -y
else
    apt-get update && apt-get upgrade -y && apt-get dist-upgrade -y
    [ $? -ne 0 ] && apt-get -yf install
fi

# deploy机器检测python环境
python2 -V &>/dev/null
if [ $? -ne 0 ];then
    if cat /etc/redhat-release &>/dev/null;then
        yum install gcc openssl-devel bzip2-devel 
        wget https://www.python.org/ftp/python/2.7.16/Python-2.7.16.tgz
        tar xzf Python-2.7.16.tgz # ?? tar xvf Python-2.7.16.tgz
        cd Python-2.7.16
        ./configure --enable-optimizations
        make install
        ln -s -f /usr/bin/python2.7 /usr/bin/python
        cd -
    else
        apt-get install -y python2.7 && ln -s -f /usr/bin/python2.7 /usr/bin/python
    fi
fi

# deploy机器设置pip安装加速源
if [[ $clustername != 'aws' ]]; then
mkdir ~/.pip
cat > ~/.pip/pip.conf <<CB
[global]
index-url = https://mirrors.aliyun.com/pypi/simple
[install]
trusted-host=mirrors.aliyun.com

CB
fi


# deploy机器安装相应软件包
# get-pip.py 详见 https://bootstrap.pypa.io
if cat /etc/redhat-release &>/dev/null;then
    yum install git python-pip sshpass -y
    [ -f ./get-pip.py ] && python ./get-pip.py || {
    wget https://bootstrap.pypa.io/pip/2.7/get-pip.py && python get-pip.py
    }
else
    apt-get install git python-pip sshpass -y
    [ -f ./get-pip.py ] && python ./get-pip.py || {
    wget https://bootstrap.pypa.io/pip/2.7/get-pip.py && python get-pip.py
    }
fi
python -m pip install --upgrade "pip < 21.0"

pip -V
pip install --no-cache-dir ansible netaddr


# 在deploy机器做其他node的ssh免密操作
for host in `echo "${nethosts}"`
do
    echo "============ ${netnum}.${host} ===========";

    if [[ ${USER} == 'root' ]];then
        [ ! -f /${USER}/.ssh/id_rsa ] &&\
        ssh-keygen -t rsa -P '' -f /${USER}/.ssh/id_rsa
    else
        [ ! -f /home/${USER}/.ssh/id_rsa ] &&\
        ssh-keygen -t rsa -P '' -f /home/${USER}/.ssh/id_rsa
    fi
    sshpass -p ${rootpasswd} ssh-copy-id -o StrictHostKeyChecking=no ${USER}@${netnum}.${host}

    if cat /etc/redhat-release &>/dev/null;then
        ssh -o StrictHostKeyChecking=no ${USER}@${netnum}.${host} "yum update -y"
    else
        ssh -o StrictHostKeyChecking=no ${USER}@${netnum}.${host} "apt-get update && apt-get upgrade -y && apt-get dist-upgrade -y"
        [ $? -ne 0 ] && ssh -o StrictHostKeyChecking=no ${USER}@${netnum}.${host} "apt-get -yf install"
    fi
done


# deploy机器下载k8s二进制安装脚本

if [[ ${software_packet} == '' ]];then
    curl -C- -fLO --retry 3 https://github.com/easzlab/kubeasz/releases/download/${release}/ezdown
    sed -ri "s+^(K8S_BIN_VER=).*$+\1${k8s_ver}+g" ezdown
    chmod +x ./ezdown
    # 使用工具脚本下载
    ./ezdown -D && ./ezdown -P
else
    tar xvf ${software_packet} -C /etc/
    chmod +x ${pwd}/{ezctl,ezdown}
fi

# 初始化一个名为my的k8s集群配置

CLUSTER_NAME="$clustername"
${pwd}/ezctl new ${CLUSTER_NAME}
if [[ $? -ne 0 ]];then
    echo "cluster name [${CLUSTER_NAME}] was exist in ${pwd}/clusters/${CLUSTER_NAME}."
    exit 1
fi

if [[ ${software_packet} != '' ]];then
    # 设置参数,启用离线安装
    sed -i 's/^INSTALL_SOURCE.*$/INSTALL_SOURCE: "offline"/g' ${pwd}/clusters/${CLUSTER_NAME}/config.yml
fi


# to check ansible service
ansible all -m ping

#---------------------------------------------------------------------------------------------------




#修改二进制安装脚本配置 config.yml

sed -ri "s+^(CLUSTER_NAME:).*$+\1 \"${CLUSTER_NAME}\"+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml

## k8s上日志及容器数据存独立磁盘步骤(参考阿里云的)

[ ! -d /var/lib/container ] && mkdir -p /var/lib/container/{kubelet,docker}

## cat /etc/fstab     
# UUID=105fa8ff-bacd-491f-a6d0-f99865afc3d6 /                       ext4    defaults        1 1
# /dev/vdb /var/lib/container/ ext4 defaults 0 0
# /var/lib/container/kubelet /var/lib/kubelet none defaults,bind 0 0
# /var/lib/container/docker /var/lib/docker none defaults,bind 0 0

## tree -L 1 /var/lib/container
# /var/lib/container
# ├── docker
# ├── kubelet
# └── lost+found

# docker data dir
DOCKER_STORAGE_DIR="/var/lib/container/docker"
sed -ri "s+^(STORAGE_DIR:).*$+STORAGE_DIR: \"${DOCKER_STORAGE_DIR}\"+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml
# containerd data dir
CONTAINERD_STORAGE_DIR="/var/lib/container/containerd"
sed -ri "s+^(STORAGE_DIR:).*$+STORAGE_DIR: \"${CONTAINERD_STORAGE_DIR}\"+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml
# kubelet logs dir
KUBELET_ROOT_DIR="/var/lib/container/kubelet"
sed -ri "s+^(KUBELET_ROOT_DIR:).*$+KUBELET_ROOT_DIR: \"${KUBELET_ROOT_DIR}\"+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml
if [[ $clustername != 'aws' ]]; then
    # docker aliyun repo
    REG_MIRRORS="https://pqbap4ya.mirror.aliyuncs.com"
    sed -ri "s+^REG_MIRRORS:.*$+REG_MIRRORS: \'[\"${REG_MIRRORS}\"]\'+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml
fi
# [docker]信任的HTTP仓库
sed -ri "s+127.0.0.1/8+${netnum}.0/24+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml
# disable dashboard auto install
sed -ri "s+^(dashboard_install:).*$+\1 \"no\"+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml


# 融合配置准备
CLUSEER_WEBSITE="${CLUSTER_NAME}k8s.gtapp.xyz"
lb_num=$(grep -wn '^MASTER_CERT_HOSTS:' ${pwd}/clusters/${CLUSTER_NAME}/config.yml |awk -F: '{print $1}')
lb_num1=$(expr ${lb_num} + 1)
lb_num2=$(expr ${lb_num} + 2)
sed -ri "${lb_num1}s+.*$+  - "${CLUSEER_WEBSITE}"+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml
sed -ri "${lb_num2}s+(.*)$+#\1+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml

# node节点最大pod 数
MAX_PODS="120"
sed -ri "s+^(MAX_PODS:).*$+\1 ${MAX_PODS}+g" ${pwd}/clusters/${CLUSTER_NAME}/config.yml



# 修改二进制安装脚本配置 hosts
# clean old ip
sed -ri '/192.168.1.1/d' ${pwd}/clusters/${CLUSTER_NAME}/hosts
sed -ri '/192.168.1.2/d' ${pwd}/clusters/${CLUSTER_NAME}/hosts
sed -ri '/192.168.1.3/d' ${pwd}/clusters/${CLUSTER_NAME}/hosts
sed -ri '/192.168.1.4/d' ${pwd}/clusters/${CLUSTER_NAME}/hosts

# 输入准备创建ETCD集群的主机位
echo "enter etcd hosts here (example: 203 202 201) ↓"
read -p "" ipnums
for ipnum in `echo ${ipnums}`
do
    echo $netnum.$ipnum
    sed -i "/\[etcd/a $netnum.$ipnum"  ${pwd}/clusters/${CLUSTER_NAME}/hosts
done

# 输入准备创建KUBE-MASTER集群的主机位
echo "enter kube-master hosts here (example: 202 201) ↓"
read -p "" ipnums
for ipnum in `echo ${ipnums}`
do
    echo $netnum.$ipnum
    sed -i "/\[kube_master/a $netnum.$ipnum"  ${pwd}/clusters/${CLUSTER_NAME}/hosts
done

# 输入准备创建KUBE-NODE集群的主机位
echo "enter kube-node hosts here (example: 204 203) ↓"
read -p "" ipnums
for ipnum in `echo ${ipnums}`
do
    echo $netnum.$ipnum
    sed -i "/\[kube_node/a $netnum.$ipnum"  ${pwd}/clusters/${CLUSTER_NAME}/hosts
done

# 配置容器运行时CNI
case ${cni} in
    flannel)
    sed -ri "s+^CLUSTER_NETWORK=.*$+CLUSTER_NETWORK=\"${cni}\"+g" ${pwd}/clusters/${CLUSTER_NAME}/hosts
    ;;
    calico)
    sed -ri "s+^CLUSTER_NETWORK=.*$+CLUSTER_NETWORK=\"${cni}\"+g" ${pwd}/clusters/${CLUSTER_NAME}/hosts
    ;;
    *)
    echo "cni need be flannel or calico."
    exit 11
esac

# 配置K8S的ETCD数据备份的定时任务
if cat /etc/redhat-release &>/dev/null;then
    if ! grep -w '94.backup.yml' /var/spool/cron/root &>/dev/null;then echo "00 00 * * * `which ansible-playbook` ${pwd}/playbooks/94.backup.yml &> /dev/null" >> /var/spool/cron/root;else echo exists ;fi
    chown root.crontab /var/spool/cron/root
    chmod 600 /var/spool/cron/root
else
    if ! grep -w '94.backup.yml' /var/spool/cron/crontabs/root &>/dev/null;then echo "00 00 * * * `which ansible-playbook` ${pwd}/playbooks/94.backup.yml &> /dev/null" >> /var/spool/cron/crontabs/root;else echo exists ;fi
    chown root.crontab /var/spool/cron/crontabs/root
    chmod 600 /var/spool/cron/crontabs/root
fi
rm /var/run/cron.reboot
service crond restart 




#---------------------------------------------------------------------------------------------------
# 准备开始安装了
rm -rf ${pwd}/{dockerfiles,docs,.gitignore,pics,dockerfiles} &&\
find ${pwd}/ -name '*.md'|xargs rm -f
read -p "Enter to continue deploy k8s to all nodes >>>" YesNobbb

# now start deploy k8s cluster 
cd ${pwd}/

# to prepare CA/certs & kubeconfig & other system settings 
${pwd}/ezctl setup ${CLUSTER_NAME} 01
sleep 1
# to setup the etcd cluster
${pwd}/ezctl setup ${CLUSTER_NAME} 02
sleep 1
# to setup the container runtime(docker or containerd)
case ${cri} in
    containerd)
    sed -ri "s+^CONTAINER_RUNTIME=.*$+CONTAINER_RUNTIME=\"${cri}\"+g" ${pwd}/clusters/${CLUSTER_NAME}/hosts
    ${pwd}/ezctl setup ${CLUSTER_NAME} 03
    ;;
    docker)
    sed -ri "s+^CONTAINER_RUNTIME=.*$+CONTAINER_RUNTIME=\"${cri}\"+g" ${pwd}/clusters/${CLUSTER_NAME}/hosts
    ${pwd}/ezctl setup ${CLUSTER_NAME} 03
    ;;
    *)
    echo "cri need be containerd or docker."
    exit 11
esac
sleep 1
# to setup the master nodes
${pwd}/ezctl setup ${CLUSTER_NAME} 04
sleep 1
# to setup the worker nodes
${pwd}/ezctl setup ${CLUSTER_NAME} 05
sleep 1
# to setup the network plugin(flannel、calico...)
${pwd}/ezctl setup ${CLUSTER_NAME} 06
sleep 1
# to setup other useful plugins(metrics-server、coredns...)
${pwd}/ezctl setup ${CLUSTER_NAME} 07
sleep 1
# [可选]对集群所有节点进行操作系统层面的安全加固  https://github.com/dev-sec/ansible-os-hardening
#ansible-playbook roles/os-harden/os-harden.yml
#sleep 1
cd `dirname ${software_packet:-/tmp}`


k8s_bin_path='/opt/kube/bin'


echo "-------------------------  k8s version list  ---------------------------"
${k8s_bin_path}/kubectl version
echo
echo "-------------------------  All Healthy status check  -------------------"
${k8s_bin_path}/kubectl get componentstatus
echo
echo "-------------------------  k8s cluster info list  ----------------------"
${k8s_bin_path}/kubectl cluster-info
echo
echo "-------------------------  k8s all nodes list  -------------------------"
${k8s_bin_path}/kubectl get node -o wide
echo
echo "-------------------------  k8s all-namespaces's pods list   ------------"
${k8s_bin_path}/kubectl get pod --all-namespaces
echo
echo "-------------------------  k8s all-namespaces's service network   ------"
${k8s_bin_path}/kubectl get svc --all-namespaces
echo
echo "-------------------------  k8s welcome for you   -----------------------"
echo

# you can use k alias kubectl to siample
echo "alias k=kubectl && complete -F __start_kubectl k" >> ~/.bashrc

# get dashboard url
${k8s_bin_path}/kubectl cluster-info|grep dashboard|awk '{print $NF}'|tee -a /root/k8s_results

# get login token
${k8s_bin_path}/kubectl -n kube-system describe secret $(${k8s_bin_path}/kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')|grep 'token:'|awk '{print $NF}'|tee -a /root/k8s_results
echo
echo "you can look again dashboard and token info at  >>> /root/k8s_results <<<"
#echo ">>>>>>>>>>>>>>>>> You can excute command [ source ~/.bashrc ] <<<<<<<<<<<<<<<<<<<<"
echo ">>>>>>>>>>>>>>>>> You need to excute command [ reboot ] to restart all nodes <<<<<<<<<<<<<<<<<<<<"
rm -f $0
[ -f ${software_packet} ] && rm -f ${software_packet}
#rm -f ${pwd}/roles/deploy/templates/${USER_NAME}-csr.json.j2
#sed -ri "s+${USER_NAME}+admin+g" ${pwd}/roles/prepare/tasks/main.yml

如下是开始安装执行脚本


# 开始在线安装(这里选择容器运行时是docker,CNI为calico,K8S集群名称为test)
bash k8s_install_new.sh bogedevops 10.0.1 201\ 202\ 203\ 204 docker calico test

# 需要注意的在线安装因为会从github及dockerhub上下载文件及镜像,有时候访问这些国外网络会非常慢,这里我也会大家准备好了完整离线安装包,下载地址如下,和上面的安装脚本放在同一目录下,再执行上面的安装命令即可
# 此离线安装包里面的k8s版本为v1.20.2
https://cloud.189.cn/t/3YBV7jzQZnAb (访问码:0xde)

# 脚本基本是自动化的,除了下面几处提示按要求复制粘贴下,再回车即可

# 输入准备创建ETCD集群的主机位,复制  203 202 201 粘贴并回车
echo "enter etcd hosts here (example: 203 202 201) ↓"

# 输入准备创建KUBE-MASTER集群的主机位,复制  202 201 粘贴并回车
echo "enter kube-master hosts here (example: 202 201) ↓"

# 输入准备创建KUBE-NODE集群的主机位,复制  204 203 粘贴并回车
echo "enter kube-node hosts here (example: 204 203) ↓"

# 这里会提示你是否继续安装,没问题的话直接回车即可
Enter to continue deploy k8s to all nodes >>>

# 安装完成后重新加载下环境变量以实现kubectl命令补齐
. ~/.bashrc 

第4关 K8s最得意的小弟Docker

知己知彼方能百战百胜,无论是游戏还是技术都是同一个道理,docker只是容器化引擎中的一种,但由于它入行较早,深得K8s的青睐,所以现在大家提到容器技术就想到docker,docker俨然成为了容器技术的代名词了,那我们该如何击败这个docker呢,下面我们仔细分析下docker它的各个属性和技能吧。

第4关 K8s最得意的小弟Docker

Docker是dotCloud公司用Google公司推出的Go语言开发实现,基于Linux内核的namespace、cgroup,以及AUFS类的Union FS等技术,对进程进行封装隔离,属于操作系统层面的虚拟化技术。

下面的图片比较了 Docker 和传统虚拟化方式的不同之处。传统虚拟机技术是虚拟出一套硬件后,在其上运行一个完整操作系统,在该系统上再运行所需应用进程;而容器内的应用进程直接运行于宿主的内核,容器内没有自己的内核,而且也没有进行硬件虚拟。因此容器要比传统虚拟机更为轻便。

第4关 K8s最得意的小弟Docker

第4关 K8s最得意的小弟Docker

为什么要使用 Docker?

  • 更高效的利用系统资源
  • 更快速的启动时间
  • 一致的运行环境
  • 持续交付和部署
  • 更轻松的迁移
  • 更轻松的维护和扩展

对比传统虚拟机总结

特性 容器 虚拟机
启动 秒级 分钟级
硬盘使用 一般为 MB 一般为 GB
性能 接近原生 弱于
系统支持量 单机支持上千个容器 一般几十个

docker的三板斧分别是:

  • 镜像(Image)
  • 容器(Container)
  • 仓库(Repository)

docker的必杀技是:

  • Dockerfile

下面以生产中实际的案例来让大家熟悉docker的整个生命周期,确保将其一击即溃。

python

FROM python:3.7-slim-stretch
MAINTAINER boge <[email protected]>

WORKDIR /app

COPY requirements.txt .

RUN  sed -i 's/deb.debian.org/ftp.cn.debian.org/g' /etc/apt/sources.list \
  && sed -i 's/security.debian.org/ftp.cn.debian.org/g' /etc/apt/sources.list \
  && apt-get update -y \
  && apt-get install -y wget gcc libsm6 libxext6 libglib2.0-0 libxrender1 git vim \
  && apt-get clean && apt-get autoremove -y && rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple -r requirements.txt \
    && rm requirements.txt

COPY . .

EXPOSE 5000
HEALTHCHECK CMD curl --fail http://localhost:5000 || exit 1

ENTRYPOINT ["gunicorn", "app:app", "-c", "gunicorn_config.py"]

golang

# stage 1: build src code to binary
FROM golang:1.13-alpine3.10 as builder
MAINTAINER boge <[email protected]>

ENV GOPROXY https://goproxy.cn

# ENV GO111MODULE on

COPY *.go /app/

RUN cd /app && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags "-s -w" -o hellogo .

# stage 2: use alpine as base image
FROM alpine:3.10

RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories && \
    apk update && \
    apk --no-cache add tzdata ca-certificates && \
    cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    # apk del tzdata && \
    rm -rf /var/cache/apk/*


COPY --from=builder /app/hellogo /hellogo

CMD ["/hellogo"] 

nodejs

FROM node:12.6.0-alpine
MAINTAINER boge <[email protected]>

WORKDIR /app
COPY package.json .

RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories && \
    apk update && \
    yarn  config set registry https://registry.npm.taobao.org && \
    yarn install

RUN yarn build

COPY . .

EXPOSE 6868

ENTRYPOINT ["yarn", "start"]

java

FROM maven:3.6.3-adoptopenjdk-8 as target

ENV MAVEN_HOME /usr/share/maven
ENV PATH $MAVEN_HOME/bin:$PATH
COPY settings.xml /usr/share/maven/conf/
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline  # use docker cache
COPY src/ /build/src/
RUN mvn clean package -Dmaven.test.skip=true


FROM java:8
WORKDIR /app
RUN  rm /etc/localtime && cp /usr/share/zoneinfo/Asia/Shanghai  /etc/localtime
COPY --from=target /build/target/*.jar  /app/app.jar
EXPOSE 8080
ENTRYPOINT ["java","-Xmx768m","-Xms256m","-Djava.security.egd=file:/dev/./urandom","-jar","/app/app.jar"]

docker的整个生命周期展示

# 登陆docker镜像仓库
docker login "仓库地址" -u "仓库用户名" -p "仓库密码"
# 从仓库下载镜像
docker pull "仓库地址"/"仓库命名空间"/"镜像名称":latest || true
# 基于Dockerfile构建本地镜像
docker build --network host --build-arg PYPI_IP="xx.xx.xx.xx" --cache-from "仓库地址"/"仓库命名空间"/"镜像名称":latest --tag "仓库地址"/"仓库命名空间"/"镜像名称":"镜像版本号" --tag "仓库地址"/"仓库命名空间"/"镜像名称":latest .
# 将构建好的本地镜像推到远端镜像仓库里面
docker push "仓库地址"/"仓库命名空间"/"镜像名称":"镜像版本号"
docker push "仓库地址"/"仓库命名空间"/"镜像名称":latest
# 基于redis的镜像运行一个docker实例
docker run --name myredis --net host -d redis:6.0.2-alpine3.11 redis-server --requirepass boGe666

开始实战,迎击第4关的小BOOS战,获取属于我们的经验值

我这里将上面的flask和golang项目上传到了网盘,地址如下:

https://cloud.189.cn/t/M36fYrIrEbui (访问码:hy47)

大家下载解压后,会得到2个目录,一个python,一个golang

# 解压
unzip docker-file.zip

# 先打包python项目的镜像并运行测试
cd python
docker build -t python/flask:v0.0.1 .
docker run -d -p 80:5000 python/flask:v0.0.1

# 再打包golang项目的镜像并运行测试
docker build -t boge/golang:v0.0.1 .
docker run -d -p80:3000 boge/golang:v0.0.1

第5关 K8s攻克作战攻略之一

第3关我们以二进制的形式部署好了一套K8S集群,现在我们就来会会K8S吧

K8s的API对象(所有怪物角色列表)

  • Namespace – 命令空间实现同一集群上的资源隔离
  • Pod – K8s的最小运行单元
  • ReplicaSet – 实现pod平滑迭代更新及回滚用,这个不需要我们实际操作
  • Deployment – 用来发布无状态应用
  • Health Check – Readiness/Liveness/maxSurge/maxUnavailable 服务健康状态检测
  • Service, Endpoint – 实现同一lables下的多个pod流量负载均衡
  • Labels – 标签,服务间选择访问的重要依据
  • Ingress – K8s的流量入口
  • DaemonSet – 用来发布守护应用,例如我们部署的CNI插件
  • HPA – Horizontal Pod Autoscaling 自动水平伸缩
  • Volume – 存储卷
  • Pv, pvc, StorageClass – 持久化存储,持久化存储 声明,动态存储pv
  • StatefulSet – 用来发布有状态应用
  • Job, CronJob – 一次性任务及定时任务
  • Configmap, serect – 服务配置及服务加密配置
  • Kube-proxy – 提供service服务流量转发的功能支持,这个不需要我们实际操作
  • RBAC, serviceAccount, role, rolebindings, clusterrole, clusterrolebindings – 基于角色的访问控制
  • Events – K8s事件流,可以用来监控相关事件用,这个不需要我们实际操作

看了上面这一堆知识点,大家是不是有点头晕了? 别担心,上述这些小怪在后面的过关流程中均会一一遇到,并且我会也教会大家怎么去战胜它们,Let’ Go!

OK,此关卡较长,这节课我们先会会Namespace和Pod这两个小怪

Namespace

namespace命令空间,后面简称ns。在K8s上面,大部分资源都受ns的限制,来做资源的隔离,少部分如pv,clusterRole等不受ns控制,这个后面会讲到。

# 查看目前集群上有哪些ns
# kubectl get ns
NAME              STATUS        AGE
default           Active        5d3h
kube-node-lease   Active        5d3h
kube-public       Active        5d3h
kube-system       Active        5d3h

# 通过kubectl 接上 -n namespaceName 来查看对应ns上面的资源信息
# kubectl -n kube-system get pod
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-7fdc86d8ff-2mcm9   1/1     Running   1          29h
calico-node-dlt57                          1/1     Running   1          29h
calico-node-tvzqj                          1/1     Running   1          29h
calico-node-vh6sk                          1/1     Running   1          29h
calico-node-wpsfh                          1/1     Running   1          29h
coredns-d9b6857b5-tt7j2                    1/1     Running   1          29h
metrics-server-869ffc99cd-n2dc4            1/1     Running   2          29h
nfs-provisioner-01-77549d5487-dbmv5        1/1     Running   2          29h

# kubectl -n kube-system top pod  #显示pod资源使用情况

# 我们通过不接-n 的情况下,都是在默认命令空间default下进行操作,在生产中,通过测试一些资源就在这里进行
[root@node-1 ~]# kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
nginx-867c95f465-njv78   1/1     Running   0          12m
[root@node-1 ~]# kubectl -n default get pod
NAME                     READY   STATUS    RESTARTS   AGE
nginx-867c95f465-njv78   1/1     Running   0          12m

# 创建也很简单
[root@node-1 ~]# kubectl create ns test
namespace/test created
[root@node-1 ~]# kubectl get ns|grep test
test  

# 删除ns
# kubectl delete ns test 
namespace "test" deleted

生产中的小技巧:k8s删除namespaces状态一直为terminating问题处理

# kubectl get ns
NAME              STATUS        AGE
default           Active        5d4h
ingress-nginx     Active        30h
kube-node-lease   Active        5d4h
kube-public       Active        5d4h
kube-system       Active        5d4h
kubevirt          Terminating   2d2h   # <------ here

1、新开一个窗口运行命令  kubectl proxy
> 此命令启动了一个代理服务来接收来自你本机的HTTP连接并转发至API服务器,同时处理身份认证

2、新开一个终端窗口,将下面shell脚本整理到文本内`1.sh`并执行,$1参数即为删除不了的ns名称
#------------------------------------------------------------------------------------
#!/bin/bash

set -eo pipefail

die() { echo "$*" 1>&2 ; exit 1; }

need() {
        which "$1" &>/dev/null || die "Binary '$1' is missing but required"
}

# checking pre-reqs

need "jq"
need "curl"
need "kubectl"

PROJECT="$1"
shift

test -n "$PROJECT" || die "Missing arguments: kill-ns <namespace>"

kubectl proxy &>/dev/null &
PROXY_PID=$!
killproxy () {
        kill $PROXY_PID
}
trap killproxy EXIT

sleep 1 # give the proxy a second

kubectl get namespace "$PROJECT" -o json | jq 'del(.spec.finalizers[] | select("kubernetes"))' | curl -s -k -H "Content-Type: application/json" -X PUT -o /dev/null --data-binary @- http://localhost:8001/api/v1/namespaces/$PROJECT/finalize && echo "Killed namespace: $PROJECT"
#------------------------------------------------------------------------------------

3. 执行脚本删除
# bash 1.sh kubevirt
Killed namespace: kubevirt
1.sh: line 23: kill: (9098) - No such process

5、查看结果
# kubectl get ns    
NAME              STATUS   AGE
default           Active   5d4h
ingress-nginx     Active   30h
kube-node-lease   Active   5d4h
kube-public       Active   5d4h
kube-system       Active   5d4h

Pod

kubectl作为管理K8s的重要cli命令行工具,运维人员必须掌握它,但里面这么多的子命令,记不住怎么办?这里就以创建pod举例

擅用-h 帮助参数

# 在新版本的K8s中,明确了相关命令就是用来创建对应资源的,不再像老版本那样混合使用,这个不是重点,创建pod,我们用kubectl run -h,来查看命令帮助,是不是豁然开朗
# kubectl run -h
Create and run a particular image in a pod.

Examples:
  # Start a nginx pod.
  kubectl run nginx --image=nginx
  ......

# 我们就用示例给出的第一个示例,来创建一个nginx的pod
# kubectl run nginx --image=nginx
pod/nginx created

# 等待镜像下载完成后,pod就会正常running了(这里介绍两个实用参数 -w代表持久监听当前namespace下的指定资源的变化;-o wide代表列出更为详细的信息,比如这里pod运行的node节点显示)
# 注: READY下面的含义是后面数字1代表这个pod里面期望的容器数量,前面的数字1代表服务正常运行就绪的容器数量
# kubectl  get pod -w -o wide
NAME                    READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
nginx                   1/1     Running   0          2m35s   172.20.139.67   10.0.1.203   <none>           <none>

# 我们来请求下这个pod的IP
# curl 172.20.139.67
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
......

# 我们进到这个pod服务内,修改下页面信息看看,这里会学到exec子命令,-it代表保持tty连接,不会一连上pod就断开了
# ************************************************************
# kubectl -it exec nginx -- sh
# echo 'hello, world!' > /usr/share/nginx/html/index.html
# exit

# curl 172.20.139.67
hello, world!


# 我们来详细分析的这个pod启动的整个流程,这里会用到kubectl的子命令 describe,它是用来描述后面所接资源的详细信息的,划重点,这个命令对于我们生产中排查K8s的问题尤其重要
# kubectl  describe pod nginx   # 这里显示内容较多,目前我只把当前关键的信息列出来

Name:         nginx
Namespace:    default
Priority:     0
Node:         10.0.1.203/10.0.1.203
Start Time:   Tue, 24 Nov 2020 14:23:56 +0800
Labels:       run=nginx
Annotations:  <none>
Status:       Running
IP:           172.20.139.67
IPs:
  IP:  172.20.139.67
Containers:
  nginx:
    Container ID:   docker://2578019be269d7b1ad02ab4dd0a8b883e79fc491ae9c5db6164120f3e1dde8c7
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:c3a1592d2b6d275bef4087573355827b200b00ffc2d9849890a4f3aa2128c4ae
    Port:           <none>
    Host Port:      <none>
    State:          Running
......中间内容省略
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  5m41s  default-scheduler  Successfully assigned default/nginx to 10.0.1.203
  Normal  Pulling    5m40s  kubelet            Pulling image "nginx"
  Normal  Pulled     5m25s  kubelet            Successfully pulled image "nginx"
  Normal  Created    5m25s  kubelet            Created container nginx
  Normal  Started    5m25s  kubelet            Started container nginx
  
 # 重点分析下最后面的Events事件链
1. kubectl 发送部署pod的请求到 API Server
2. API Server 通知 Controller Manager 创建一个 pod 资源
3. Scheduler 执行调度任务,Events的第一条打印信息就明确显示了这个pod被调度到10.0.1.203这个node节点上运行,接着开始拉取相应容器镜像,拉取完成后开始创建nginx服务,至到最后服务创建完成,在有时候服务报错的时候,这里也会显示相应详细的报错信息

但我们在生产中是不建议直接用来创建pod,先直接演示下:

# 我们删除掉这个nginx的pod
# kubectl delete pod nginx
pod "nginx" deleted

# kubectl get pod
现在已经看不到这个pod了,假设这里是我们运行的一个服务,而恰好运行这个pod的这台node当机了,那么这个服务就没了,它不会自动飘移到其他node上去,也就发挥不了K8s最重要的保持期待的服务特性了。

小技巧之列出镜像的相关tag,方便进行镜像tag版本选择:

这个脚本是从二进制安装K8S那个项目里面提取的一个小脚本,因为用来查docker镜像版本很方便,所以在这里分享给大家

# cat /opt/kube/bin/docker-tag        
#!/bin/bash
#

MTAG=$2
CONTAIN=$3

function usage() {
cat << HELP

docker-tag  --  list all tags for a Docker image on a remote registry

EXAMPLe:
    - list all tags for nginx:
       docker-tag tags nginx

    - list all nginx tags containing alpine:
       docker-tag tags nginx alpine

HELP
}

if [ $# -lt 1 ]; then
        usage
        exit 2
fi

function tags() {
    TAGS=$(curl -ksL https://registry.hub.docker.com/v1/repositories/${MTAG}/tags | sed -e 's/[][]//g' -e 's/"//g' -e 's/ //g' | tr '}' '\n'  | awk -e: '{print $3}')
    if [ "${CONTAIN}" != "" ]; then
        echo -e $(echo "${TAGS}" | grep "${CONTAIN}") | tr ' ' '\n'
    else
        echo "${TAGS}"
    fi
}


case $1 in
    tags)
        tags
        ;;
    *)
        usage
        ;;
esac

显示结果如下:

# docker-tag tags nginx        
latest
1
1-alpine
1-alpine-perl
1-perl
1.10
1.10-alpine

pod小怪战斗(作业)

# 试着创建一个redis服务的pod,并且使用exec进入这个pod,通过客户端命令redis-cli连接到redis-server ,插入一个key a ,value 为666,最后删除这个redis的pod
root@redis:/data# redis-cli 
127.0.0.1:6379> get a
(nil)
127.0.0.1:6379> set a 666
OK
127.0.0.1:6379> get a
"666"

第5关 K8s攻克作战攻略之二-Deployment

Deployment

这节课大家跟随博哥爱运维来会会deployment这个怪物

K8s会通过各种Controller来管理Pod的生命周期,为了满足不同的业务场景,K8s开发了Deployment、ReplicaSet、DaemonSet、StatefuleSet、Job、cronJob等多种Controller ,这里我们首先来学习下最常用的Deployment,这是我们生产中用的最多的一个controller,适合用来发布无状态应用.

我们先来运行一个Deployment实例:

# 创建一个deployment,引用nginx的服务镜像,这里的副本数量默认是1,nginx容器镜像用的是latest
# 在K8s新版本开始,对服务api进行了比较大的梳理,明确了各个api的具体职责,而不像以前旧版本那样混为一谈
# kubectl create deployment nginx --image=nginx
deployment.apps/nginx created

# 查看创建结果
# kubectl  get deployments.apps 
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   0/1     1            0           6s

# kubectl  get rs   # <-- 看下自动关联创建的副本集replicaset
NAME              DESIRED   CURRENT   READY   AGE
nginx-f89759699   1         1         0       10s

# kubectl get pod   # <-- 查看生成的pod,注意镜像下载需要一定时间,耐心等待,注意观察pod名称的f89759699,是不是和上面rs的一样,对了,因为这里的pod就是由上面的rs创建出来,为什么要设置这么一个环节呢,后面会以实例来演示
NAME                    READY   STATUS              RESTARTS   AGE
nginx-f89759699-26fzd   0/1     ContainerCreating   0          13s

# kubectl get pod
NAME                    READY   STATUS    RESTARTS   AGE
nginx-f89759699-26fzd   1/1     Running   0          98s


# 扩容pod的数量
# kubectl scale deployment nginx --replicas=2
deployment.apps/nginx scaled

# 查看扩容后的pod结果
# kubectl get pod
NAME                    READY   STATUS              RESTARTS   AGE
nginx-f89759699-26fzd   1/1     Running             0          112s
nginx-f89759699-9s4dw   0/1     ContainerCreating   0          2s

# 具体看下pod是不是分散运行在不同的node上呢
# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
nginx-f89759699-26fzd   1/1     Running   0          45m   172.20.0.16   10.0.1.202   <none>           <none>
nginx-f89759699-9s4dw   1/1     Running   0          43m   172.20.1.14   10.0.1.201   <none>           <none>


# 接下来替换下这个deployment里面nginx的镜像版本,来讲解下为什么需要rs副本集呢,这个很重要哦
# 我们先看看目前nginx是哪个版本,随便输入一个错误的uri,页面就会打印出nginx的版本号了
curl 10.68.86.85/1
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.19.4</center>
</body>
</html>

# 根据输出可以看到版本号是nginx/1.19.4,这里利用上面提到的命令docker-tag来看下nginx有哪些其他的版本,然后我在里面挑选了1.9.9这个tag号
# 注意命令最后面的 `--record` 参数,这个在生产中作为资源创建更新用来回滚的重要标记,强烈建议在生产中操作时都加上这个参数
# kubectl set image deployment/nginx  nginx=nginx:1.9.9 --record 
deployment.apps/nginx image updated

# 观察下pod的信息,可以看到旧nginx的2个pod逐渐被新的pod一个一个的替换掉
# kubectl  get pod -w
NAME                    READY   STATUS              RESTARTS   AGE
nginx-89fc8d79d-4z876   1/1     Running             0          41s
nginx-89fc8d79d-jd78f   0/1     ContainerCreating   0          3s
nginx-f89759699-9cx7l   1/1     Running             0          4h53m

# 我们再看下nginx的rs,可以看到现在有两个了
# kubectl get rs
NAME              DESIRED   CURRENT   READY   AGE
nginx-89fc8d79d   2         2         2       9m6s
nginx-f89759699   0         0         0       6h15m

# 看下现在nginx的描述信息,我们来详细分析下这个过程
# kubectl  describe deployment nginx
Name:                   nginx
Namespace:              default
CreationTimestamp:      Tue, 24 Nov 2020 09:40:54 +0800
Labels:                 app=nginx
......
RollingUpdateStrategy:  25% max unavailable, 25% max surge  # 注意这里,这个就是用来控制rs新旧版本迭代更新的一个频率,滚动更新的副本总数最大值(以2的基数为例):2+2*25%=2.5 -- > 3,可用副本数最大值(默认值两个都是25%):2-2*25%=1.5 --> 2
......
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  21m   deployment-controller  Scaled up replica set nginx-89fc8d79d to 1  # 启动1个新版本的pod
  Normal  ScalingReplicaSet  20m   deployment-controller  Scaled down replica set nginx-f89759699 to 1 # 上面完成就释放掉一个旧版本的
  Normal  ScalingReplicaSet  20m   deployment-controller  Scaled up replica set nginx-89fc8d79d to 2 # 然后再启动1个新版本的pod
  Normal  ScalingReplicaSet  20m   deployment-controller  Scaled down replica set nginx-f89759699 to 0 # 释放掉最后1个旧的pod


# 回滚
# 还记得我们上面提到的 --record  参数嘛,这里它就会发挥很重要的作用了
# 这里还以nginx服务为例,先看下当前nginx的版本号

# curl  10.68.18.121/1         
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.9.9</center>
</body>
</html>

# 升级nginx的版本
#  kubectl set image deployments/nginx nginx=nginx:1.19.5 --record 

# 已经升级完成
# curl  10.68.18.121/1         
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.19.5</center>
</body>
</html>

# 这里假设是我们在发版服务的新版本,结果线上反馈版本有问题,需要马上回滚,看看在K8s上怎么操作吧
# 首先通过命令查看当前历史版本情况,只有接了`--record`参数的命令操作才会有详细的记录,这就是为什么在生产中操作一定得加上的原因了
# kubectl rollout history deployment nginx 
deployment.apps/nginx 
REVISION  CHANGE-CAUSE
1         <none>
2         kubectl set image deployments/nginx nginx=nginx:1.9.9 --record=true
3         kubectl set image deployments/nginx nginx=nginx:1.19.5 --record=true

# 根据历史发布版本前面的阿拉伯数字序号来选择回滚版本,这里我们回到上个版本号,也就是选择2 ,执行命令如下:
# kubectl rollout undo deployment nginx --to-revision=2
deployment.apps/nginx rolled back

# 等一会pod更新完成后,看下结果已经回滚完成了,怎么样,在K8s操作就是这么简单:
# curl  10.68.18.121/1                                 
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.9.9</center>
</body>
</html>

# 可以看到现在最新版本号是4了,具体版本看操作的命令显示是1.9.9 ,并且先前回滚过的版本号2已经没有了,因为它已经变成4了
# kubectl rollout history deployment nginx             
deployment.apps/nginx 
REVISION  CHANGE-CAUSE
1         <none>
3         kubectl set image deployments/nginx nginx=nginx:1.19.5 --record=true
4         kubectl set image deployments/nginx nginx=nginx:1.9.9 --record=true

Deployment很重要,我们这里再来回顾下整个部署过程,加深理解

第5关 K8s攻克作战攻略之二-Deployment

10.0.1.201 10.0.1.202

  1. kubectl 发送部署请求到 API Server
  2. API Server 通知 Controller Manager 创建一个 deployment 资源(scale扩容)
  3. Scheduler 执行调度任务,将两个副本 Pod 分发到 10.0.1.201 和 10.0.1.202
  4. 10.0.1.201 和 10.0.1.202 上的 kubelet在各自的节点上创建并运行 Pod
  5. 升级deployment的nginx服务镜像

这里补充一下:

这些应用的配置和当前服务的状态信息都是保存在ETCD中,执行kubectl get pod等操作时API Server会从ETCD中读取这些数据

calico会为每个pod分配一个ip,但要注意这个ip不是固定的,它会随着pod的重启而发生变化

附:Node管理

禁止pod调度到该节点上

kubectl cordon

驱逐该节点上的所有pod kubectl drain 该命令会删除该节点上的所有Pod(DaemonSet除外),在其他node上重新启动它们,通常该节点需要维护时使用该命令。直接使用该命令会自动调用kubectl cordon 命令。当该节点维护完成,启动了kubelet后,再使用kubectl uncordon 即可将该节点添加到kubernetes集群中。

上面我们是用命令行来创建的deployment,但在生产中,很多时候,我们是直接写好yaml配置文件,再通过kubectl apply -f xxx.yaml来创建这个服务,我们现在用yaml配置文件的方式实现上面deployment服务的创建

需要注意的是,yaml文件格式缩进和python语法类似,对于缩进格式要求很严格,任何一处错误,都会造成无法创建,这里教大家一招实用的技巧来生成规范的yaml配置

# 这条命令是不是很眼熟,对了,这就是上面创建deployment的命令,我们在后面加上`--dry-run -o yaml`,--dry-run代表这条命令不会实际在K8s执行,-o yaml是会将试运行结果以yaml的格式打印出来,这样我们就能轻松获得yaml配置了

# kubectl create deployment nginx --image=nginx --dry-run -o yaml       
apiVersion: apps/v1     # <---  apiVersion 是当前配置格式的版本
kind: Deployment     #<--- kind 是要创建的资源类型,这里是 Deployment
metadata:        #<--- metadata 是该资源的元数据,name 是必需的元数据项
  creationTimestamp: null
  labels:
    app: nginx
  name: nginx
spec:        #<---    spec 部分是该 Deployment 的规格说明
  replicas: 1        #<---  replicas 指明副本数量,默认为 1
  selector:
    matchLabels:
      app: nginx
  strategy: {}
  template:        #<---   template 定义 Pod 的模板,这是配置文件的重要部分
    metadata:        #<---     metadata 定义 Pod 的元数据,至少要定义一个 label。label 的 key 和 value 可以任意指定
      creationTimestamp: null
      labels:
        app: nginx
    spec:           #<---  spec 描述 Pod 的规格,此部分定义 Pod 中每一个容器的属性,name 和 image 是必需的
      containers:
      - image: nginx
        name: nginx
        resources: {}
status: {}

我们这里用这个yaml文件来创建nginx的deployment试试,我们先删除掉先用命令行创建的nginx

# 在K8s上命令行删除一个资源直接用delete参数
# kubectl delete deployment nginx
deployment.apps "nginx" deleted

# 可以看到关联的rs副本集也被自动清空了
# kubectl  get rs
No resources found in default namespace.

# 相关的pod也没了
# kubectl get pod 
No resources found in default namespace.

生成nginx.yaml文件

# kubectl create deployment nginx --image=nginx --dry-run -o yaml > nginx.yaml
我们注意到执行上面命令时会有一条告警提示... --dry-run is deprecated and can be replaced with --dry-run=client.  ,虽然并不影响我们生成正常的yaml配置,但如果看着不爽可以按命令提示将--dry-run换成--dry-run=client
# 接着我们vim nginx.yaml,将replicas: 1的数量改成replicas: 2

# 开始创建,我们后面这类基于yaml文件来创建资源的命令统一都用apply了
# kubectl  apply -f nginx.yaml 
deployment.apps/nginx created

# 查看创建的资源,这个有个小技巧,同时查看多个资源可以用,分隔,这样一条命令就可以查看多个资源了
# kubectl get deployment,rs,pod
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   2/2     2            2           116s

NAME                              DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-f89759699   2         2         2       116s

NAME                        READY   STATUS    RESTARTS   AGE
pod/nginx-f89759699-bzwd2   1/1     Running   0          116s
pod/nginx-f89759699-qlc8q   1/1     Running   0          116s

# 删除通过kubectl apply -f nginx.yaml创建的资源
kubectl delete -f nginx.yaml

基于这两种资源创建的方式作个总结:

基于命令的方式:
1.简单直观快捷,上手快。
2.适合临时测试或实验。

基于配置文件的方式:
1.配置文件描述了 What,即应用最终要达到的状态。
2.配置文件提供了创建资源的模板,能够重复部署。
3.可以像管理代码一样管理部署。
4.适合正式的、跨环境的、规模化部署。
5.这种方式要求熟悉配置文件的语法,有一定难度。

deployment小怪战斗(作业)

试着用命令行和yaml配置这两种方式,来创建redis的deployment服务,同时可以将pod后面的作业再复习下

第5关 K8s攻克作战攻略之三-服务pod的健康检测

大家好,我是博哥爱运维,这节课内容给大家讲解下在K8S上,我们如果对我们的业务服务进行健康检测。

Health Check

这里我们再进一步,来聊聊K8s上面服务的健康检测特性。在K8s上,强大的自愈能力是这个容器编排引擎的非常重要的一个特性,自愈的默认实现方式是通过自动重启发生故障的容器,使之恢复正常。除此之外,我们还可以利用Liveness 和 Readiness检测机制来设置更为精细的健康检测指标,从而实现如下的需求:

  • 零停机部署
  • 避免部署无效的服务镜像
  • 更加安全地滚动升级

下面我们先来实践学习下K8s的Healthz Check功能,我们先来学习下K8s默认的健康检测机制

每个容器启动时都会执行一个进程,此进程是由Dockerfile的CMD 或 ENTRYPOINT来指定,当容器内进程退出时返回状态码为非零,则会认为容器发生了故障,K8s就会根据restartPolicy来重启这个容器,以达到自愈的效果

下面我们来动手实践下,模拟一个容器发生故障时的场景 :

# 先来生成一个pod的yaml配置文件,并对其进行相应修改
# kubectl run  busybox --image=busybox --dry-run=client -o yaml > testHealthz.yaml
# vim testHealthz.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: busybox
  name: busybox
spec:
  containers:
  - image: busybox
    name: busybox
    resources: {}
    args:
    - /bin/sh
    - -c
    - sleep 10; exit 1       # 并添加pod运行指定脚本命令,模拟容器启动10秒后发生故障,退出状态码为1
  dnsPolicy: ClusterFirst
  restartPolicy: OnFailure # 将默认的Always修改为OnFailure
status: {}
重启策略 说明
Always 当容器失效时,由kubelet自动重启该容器
OnFailure 当容器终止运行且退出码不为0时,由kubelet自动重启该容器
Never 不论容器运行状态如何,kubelet都不会重启该容器

执行配置创建pod

# kubectl apply -f testHealthz.yaml 
pod/busybox created

# 观察几分钟,利用-w 参数来持续监听pod的状态变化
# kubectl  get pod -w
NAME                     READY   STATUS              RESTARTS   AGE
busybox                  0/1     ContainerCreating   0          4s
busybox                  1/1     Running             0          6s
busybox                  0/1     Error               0          16s
busybox                  1/1     Running             1          22s
busybox                  0/1     Error               1          34s
busybox                  0/1     CrashLoopBackOff    1          47s
busybox                  1/1     Running             2          63s
busybox                  0/1     Error               2          73s
busybox                  0/1     CrashLoopBackOff    2          86s
busybox                  1/1     Running             3          109s
busybox                  0/1     Error               3          2m
busybox                  0/1     CrashLoopBackOff    3          2m15s
busybox                  1/1     Running             4          3m2s
busybox                  0/1     Error               4          3m12s
busybox                  0/1     CrashLoopBackOff    4          3m23s
busybox                  1/1     Running             5          4m52s
busybox                  0/1     Error               5          5m2s
busybox                  0/1     CrashLoopBackOff    5          5m14s

上面可以看到这个测试pod被重启了5次,然而服务始终正常不了,就会保持在CrashLoopBackOff了,等待运维人员来进行下一步错误排查
注:kubelet会以指数级的退避延迟(10s,20s,40s等)重新启动它们,上限为5分钟
这里我们是人为模拟服务故障来进行的测试,在实际生产工作中,对于业务服务,我们如何利用这种重启容器来恢复的机制来配置业务服务呢,答案是`liveness`检测

Liveness

Liveness检测让我们可以自定义条件来判断容器是否健康,如果检测失败,则K8s会重启容器,我们来个例子实践下,准备如下yaml配置并保存为liveness.yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness
spec:
  restartPolicy: OnFailure
  containers:
  - name: liveness
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 10   # 容器启动 10 秒之后开始检测
      periodSeconds: 5          # 每隔 5 秒再检测一次

启动进程首先创建文件 /tmp/healthy,30 秒后删除,在我们的设定中,如果 /tmp/healthy 文件存在,则认为容器处于正常状态,反正则发生故障。

livenessProbe 部分定义如何执行 Liveness 检测:

检测的方法是:通过 cat 命令检查 /tmp/healthy 文件是否存在。如果命令执行成功,返回值为零,K8s 则认为本次 Liveness 检测成功;如果命令返回值非零,本次 Liveness 检测失败。

initialDelaySeconds: 10 指定容器启动 10 s之后开始执行 Liveness 检测,我们一般会根据应用启动的准备时间来设置。比如某个应用正常启动要花 30 秒,那么 initialDelaySeconds 的值就应该大于 30。

periodSeconds: 5 指定每 5 秒执行一次 Liveness 检测。K8s 如果连续执行 3 次 Liveness 检测均失败,则会杀掉并重启容器。

接着来创建这个Pod:

# kubectl apply -f liveness.yaml 
pod/liveness created

从配置文件可知,最开始的 30 秒,/tmp/healthy 存在,cat 命令返回 0,Liveness 检测成功,这段时间 kubectl describe pod liveness 的 Events部分会显示正常的日志

# kubectl describe pod liveness
......
Events:
  Type     Reason     Age              From               Message
  ----     ------     ----             ----               -------
  Normal   Scheduled  53s              default-scheduler  Successfully assigned default/liveness to 10.0.1.203
  Normal   Pulling    52s              kubelet            Pulling image "busybox"
  Normal   Pulled     43s              kubelet            Successfully pulled image "busybox"
  Normal   Created    43s              kubelet            Created container liveness
  Normal   Started    42s              kubelet            Started container liveness

35 秒之后,日志会显示 /tmp/healthy 已经不存在,Liveness 检测失败。再过几十秒,几次检测都失败后,容器会被重启。

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  3m53s                default-scheduler  Successfully assigned default/liveness to 10.0.1.203
  Normal   Pulling    73s (x3 over 3m52s)  kubelet            Pulling image "busybox"
  Normal   Pulled     62s (x3 over 3m43s)  kubelet            Successfully pulled image "busybox"
  Normal   Created    62s (x3 over 3m43s)  kubelet            Created container liveness
  Normal   Started    62s (x3 over 3m42s)  kubelet            Started container liveness
  Warning  Unhealthy  18s (x9 over 3m8s)   kubelet            Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    18s (x3 over 2m58s)  kubelet            Container liveness failed liveness probe, will be restarted

除了 Liveness 检测,Kubernetes Health Check 机制还包括 Readiness 检测。

Readiness

我们可以通过Readiness检测来告诉K8s什么时候可以将pod加入到服务Service的负载均衡池中,对外提供服务,这个在生产场景服务发布新版本时非常重要,当我们上线的新版本发生程序错误时,Readiness会通过检测发布,从而不导入流量到pod内,将服务的故障控制在内部,在生产场景中,建议这个是必加的,Liveness不加都可以,因为有时候我们需要保留服务出错的现场来查询日志,定位问题,告之开发来修复程序。

Readiness 检测的配置语法与 Liveness 检测完全一样,下面是个例子:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness
spec:
  restartPolicy: OnFailure
  containers:
  - name: liveness
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
    readinessProbe:    # 这里将livenessProbe换成readinessProbe即可,其它配置都一样
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 10   # 容器启动 10 秒之后开始检测
      periodSeconds: 5          # 每隔 5 秒再检测一次

保存上面这个配置为readiness.yaml,并执行它生成pod:

# kubectl apply -f readiness.yaml 
pod/liveness created

# 观察,在刚开始创建时,文件并没有被删除,所以检测一切正常
# kubectl  get pod
NAME                     READY   STATUS    RESTARTS   AGE
liveness                 1/1     Running   0          50s

# 然后35秒后,文件被删除,这个时候READY状态就会发生变化,K8s会断开Service到pod的流量
# kubectl  describe pod liveness 
......
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  56s               default-scheduler  Successfully assigned default/liveness to 10.0.1.203
  Normal   Pulling    56s               kubelet            Pulling image "busybox"
  Normal   Pulled     40s               kubelet            Successfully pulled image "busybox"
  Normal   Created    40s               kubelet            Created container liveness
  Normal   Started    40s               kubelet            Started container liveness
  Warning  Unhealthy  5s (x2 over 10s)  kubelet            Readiness probe failed: cat: can't open '/tmp/healthy': No such file or directory

# 可以看到pod的流量被断开,这时候即使服务出错,对外界来说也是感知不到的,这时候我们运维人员就可以进行故障排查了
# kubectl  get pod
NAME                     READY   STATUS    RESTARTS   AGE
liveness                 0/1     Running   0          61s

下面对 Liveness 检测和 Readiness 检测做个比较:

Liveness 检测和 Readiness 检测是两种 Health Check 机制,如果不特意配置,Kubernetes 将对两种检测采取相同的默认行为,即通过判断容器启动进程的返回值是否为零来判断检测是否成功。

两种检测的配置方法完全一样,支持的配置参数也一样。不同之处在于检测失败后的行为:Liveness 检测是重启容器;Readiness 检测则是将容器设置为不可用,不接收 Service 转发的请求。

Liveness 检测和 Readiness 检测是独立执行的,二者之间没有依赖,所以可以单独使用,也可以同时使用。用 Liveness 检测判断容器是否需要重启以实现自愈;用 Readiness 检测判断容器是否已经准备好对外提供服务。

Health Check 在 业务生产中滚动更新(rolling update)的应用场景

对于运维人员来说,将服务的新项目代码更新上线,确保其稳定运行是一项很关键,且重复性很高的任务,在传统模式下,我们一般是用saltsatck或者ansible等批量管理工具来推送代码到各台服务器上进行更新,那么在K8s上,这个更新流程就被简化了,在后面高阶章节我会讲到CI/CD自动化流程,大致就是开发人员开发好代码上传代码仓库即会触发CI/CD流程,这之间基本无需运维人员的参与。那么在这么高度自动化的流程中,我们运维人员怎么确保服务能稳定上线呢?Health Check里面的Readiness 能发挥很关键的作用,这个其实在上面也有讲过,这里我们再以实例来说一遍,加深印象:

我们准备一个deployment资源的yaml文件

# cat myapp-v1.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mytest
spec:
  replicas: 10     # 这里准备10个数量的pod
  selector:
    matchLabels:
      app: mytest
  template:
    metadata:
      labels:
        app: mytest
    spec:
      containers:
      - name: mytest
        image: busybox
        args:
        - /bin/sh
        - -c
        - sleep 10; touch /tmp/healthy; sleep 30000		# 生成 /tmp/healthy文件,用于健康检查
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          initialDelaySeconds: 10
          periodSeconds: 5

运行这个配置

# kubectl apply -f myapp-v1.yaml --record         
deployment.apps/mytest created

# 等待一会,可以看到所有pod已正常运行
# kubectl  get pod
NAME                     READY   STATUS    RESTARTS   AGE
mytest-d9f48585b-2lmh2   1/1     Running   0          3m22s
mytest-d9f48585b-5lh9l   1/1     Running   0          3m22s
mytest-d9f48585b-cwb8l   1/1     Running   0          3m22s
mytest-d9f48585b-f6tzc   1/1     Running   0          3m22s
mytest-d9f48585b-hb665   1/1     Running   0          3m22s
mytest-d9f48585b-hmqrw   1/1     Running   0          3m22s
mytest-d9f48585b-jm8bm   1/1     Running   0          3m22s
mytest-d9f48585b-kxm2m   1/1     Running   0          3m22s
mytest-d9f48585b-lqpr9   1/1     Running   0          3m22s
mytest-d9f48585b-pk75z   1/1     Running   0          3m22s

接着我们来准备更新这个服务,并且人为模拟版本故障来进行观察,新准备一个配置myapp-v2.yaml

# cat myapp-v2.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mytest
spec:
  strategy:
    rollingUpdate:
      maxSurge: 35%   # 滚动更新的副本总数最大值(以10的基数为例):10 + 10 * 35% = 13.5 --> 14
      maxUnavailable: 35%  # 可用副本数最大值(默认值两个都是25%): 10 - 10 * 35% = 6.5  --> 7
  replicas: 10
  selector:
    matchLabels:
      app: mytest
  template:
    metadata:
      labels:
        app: mytest
    spec:
      containers:
      - name: mytest
        image: busybox
        args:
        - /bin/sh
        - -c
        - sleep 30000   # 可见这里并没有生成/tmp/healthy这个文件,所以下面的检测必然失败
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          initialDelaySeconds: 10
          periodSeconds: 5

很明显这里因为我们更新的这个v2版本里面不会生成/tmp/healthy文件,那么自然是无法通过Readiness 检测的,详情如下:

# kubectl apply -f myapp-v2.yaml --record 
deployment.apps/mytest configured

# kubectl get deployment mytest 
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
mytest   7/10    7            7           4m58s
# READY 现在正在运行的只有7个pod
# UP-TO-DATE 表示当前已经完成更新的副本数:即 7 个新副本
# AVAILABLE 表示当前处于 READY 状态的副本数

# kubectl get pod
NAME                      READY   STATUS    RESTARTS   AGE
mytest-7657789bc7-5hfkc   0/1     Running   0          3m2s
mytest-7657789bc7-6c5lg   0/1     Running   0          3m2s
mytest-7657789bc7-c96t6   0/1     Running   0          3m2s
mytest-7657789bc7-nbz2q   0/1     Running   0          3m2s
mytest-7657789bc7-pt86c   0/1     Running   0          3m2s
mytest-7657789bc7-q57gb   0/1     Running   0          3m2s
mytest-7657789bc7-x77cg   0/1     Running   0          3m2s
mytest-d9f48585b-2bnph    1/1     Running   0          5m4s
mytest-d9f48585b-965t4    1/1     Running   0          5m4s
mytest-d9f48585b-cvq7l    1/1     Running   0          5m4s
mytest-d9f48585b-hvpnq    1/1     Running   0          5m4s
mytest-d9f48585b-k89zs    1/1     Running   0          5m4s
mytest-d9f48585b-wkb4b    1/1     Running   0          5m4s
mytest-d9f48585b-wrkzf    1/1     Running   0          5m4s
# 上面可以看到,由于 Readiness 检测一直没通过,所以新版本的pod都是Not ready状态的,这样就保证了错误的业务代码不会被外界请求到

# kubectl describe deployment mytest
# 下面截取一些这里需要的关键信息
......
Replicas:               10 desired | 7 updated | 14 total | 7 available | 7 unavailable
......
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  5m55s  deployment-controller  Scaled up replica set mytest-d9f48585b to 10
  Normal  ScalingReplicaSet  3m53s  deployment-controller  Scaled up replica set mytest-7657789bc7 to 4  # 启动4个新版本的pod
  Normal  ScalingReplicaSet  3m53s  deployment-controller  Scaled down replica set mytest-d9f48585b to 7 # 将旧版本pod数量降至7
  Normal  ScalingReplicaSet  3m53s  deployment-controller  Scaled up replica set mytest-7657789bc7 to 7  # 新增3个启动至7个新版本

综合上面的分析,我们很真实的模拟一次K8s上次错误的代码上线流程,所幸的是这里有Health Check的Readiness检测帮我们屏蔽了有错误的副本,不至于被外面的流量请求到,同时保留了大部分旧版本的pod,因此整个服务的业务并没有因这此更新失败而受到影响。

接下来我们详细分析下滚动更新的原理,为什么上面服务新版本创建的pod数量是7个,同时只销毁了3个旧版本的pod呢?

原因就在于这段配置:

我们不显式配置这段的话,默认值均是25%

  strategy:
    rollingUpdate:
      maxSurge: 35%
      maxUnavailable: 35%

滚动更新通过参数maxSurge和maxUnavailable来控制pod副本数量的更新替换。

maxSurge

这个参数控制滚动更新过程中pod副本总数超过设定总副本数量的上限。maxSurge 可以是具体的整数(比如 3),也可以是百分比,向上取整。maxSurge 默认值为 25%

在上面测试的例子里面,pod的总副本数量是10,那么在更新过程中,总副本数量的上限大最值计划公式为:

10 + 10 * 35% = 13.5 --> 14

我们查看下更新deployment的描述信息:

Replicas: 10 desired | 7 updated | 14 total | 7 available | 7 unavailable

旧版本available 的数量7个 + 新版本unavailable`的数量7个 = 总数量 14 total

maxUnavailable

这个参数控制滚动更新过程中不可用的pod副本总数量的值,同样,maxUnavailable 可以是具体的整数(比如 3),也可以是百分百,向下取整。maxUnavailable 默认值为 25%。

在上面测试的例子里面,pod的总副本数量是10,那么要保证正常可用的pod副本数量为:

10 - 10 * 35% = 6.5 --> 7

所以我们在上面查看的描述信息里,7 available 正常可用的pod数量值就为7

maxSurge 值越大,初始创建的新副本数量就越多;maxUnavailable 值越大,初始销毁的旧副本数量就越多。

正常更新理想情况下,我们这次版本发布案例滚动更新的过程是:

  1. 首先创建4个新版本的pod,使副本总数量达到14个
  2. 然后再销毁3个旧版本的pod,使可用的副本数量降为7个
  3. 当这3个旧版本的pod被 成功销毁后,可再创建3个新版本的pod,使总的副本数量保持为14个
  4. 当新版本的pod通过Readiness 检测后,会使可用的pod副本数量增加超过7个
  5. 然后可以继续销毁更多的旧版本的pod,使整体可用的pod数量回到7个
  6. 随着旧版本的pod销毁,使pod副本总数量低于14个,这样就可以继续创建更多的新版本的pod
  7. 这个新增销毁流程会持续地进行,最终所有旧版本的pod会被新版本的pod逐渐替换,整个滚动更新完成

而我们这里的实际情况是在第4步就卡住了,新版本的pod数量无法能过Readiness 检测。上面的描述信息最后面的事件部分的日志也详细说明了这一切:

Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  5m55s  deployment-controller  Scaled up replica set mytest-d9f48585b to 10
  Normal  ScalingReplicaSet  3m53s  deployment-controller  Scaled up replica set mytest-7657789bc7 to 4  # 启动4个新版本的pod
  Normal  ScalingReplicaSet  3m53s  deployment-controller  Scaled down replica set mytest-d9f48585b to 7 # 将旧版本pod数量降至7
  Normal  ScalingReplicaSet  3m53s  deployment-controller  Scaled up replica set mytest-7657789bc7 to 7  # 新增3个启动至7个新版本

这里按正常的生产处理流程,在获取足够的新版本错误信息提交给开发分析后,我们可以通过kubectl rollout undo 来回滚到上一个正常的服务版本:

# 先查看下要回滚版本号前面的数字,这里为1
# kubectl rollout history deployment mytest 
deployment.apps/mytest 
REVISION  CHANGE-CAUSE
1         kubectl apply --filename=myapp-v1.yaml --record=true
2         kubectl apply --filename=myapp-v2.yaml --record=true

# kubectl rollout undo deployment mytest --to-revision=1
deployment.apps/mytest rolled back

# kubectl get deployment mytest
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
mytest   10/10   10           10          96m

# kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
mytest-d9f48585b-2bnph   1/1     Running   0          96m
mytest-d9f48585b-8nvhd   1/1     Running   0          2m13s
mytest-d9f48585b-965t4   1/1     Running   0          96m
mytest-d9f48585b-cvq7l   1/1     Running   0          96m
mytest-d9f48585b-hvpnq   1/1     Running   0          96m
mytest-d9f48585b-k89zs   1/1     Running   0          96m
mytest-d9f48585b-qs5c6   1/1     Running   0          2m13s
mytest-d9f48585b-wkb4b   1/1     Running   0          96m
mytest-d9f48585b-wprlz   1/1     Running   0          2m13s
mytest-d9f48585b-wrkzf   1/1     Running   0          96m

OK,到这里为止,我们真实的模拟了一次有问题的版本发布及回滚,并且可以看到,在这整个过程中,虽然出现了问题,但我们的业务依然是没有受到任何影响的,这就是K8s的魅力所在。

pod小怪战斗(作业)

# 把上面整个更新及回滚的案例,自己再测试一遍,注意观察其中的pod变化,加深理解 

第5关 k8s架构师课程攻克作战攻略之四-Service

这节课内容给大家讲解下在K8S上如何来使用service做内部服务pod的流量负载均衡。

Service、Endpoint

K8s是运维人员的救星,为什么这么说呢,因为它里面的运行机制能确保你需要运行的服务,一直保持所期望的状态,还是以上面nginx服务举例,我们不能确保其pod运行的node节点什么时候会当掉,同时在pod环节也说过,pod的IP每次重启都会发生改变,所以我们不应该期望K8s的pod是健壮的,而是要按最坏的打算来假设服务pod中的容器会因为代码有bug、所以node节点不稳定等等因素发生故障而挂掉,这时候如果我们用的Deployment,那么它的controller会通过动态创建新pod到可用的node上,同时删除旧的pod来保证应用整体的健壮性;并且流量入口这块用一个能固定IP的service来充当抽象的内部负载均衡器,提供pod的访问,所以这里等于就是K8s成为了一个7 x 24小时在线处理服务pod故障的运维机器人。

创建一个service服务来提供固定IP轮询访问上面创建的nginx服务的2个pod(nodeport)

# 创建nginx的deployment
# kubectl apply -f nginx.yaml
# 给这个nginx的deployment生成一个service(简称svc)
# 同时也可以用生成yaml配置的形式来创建 kubectl expose deployment nginx --port=80 --target-port=80 --dry-run=client -o yaml
# 我们可以先把上面的yaml配置导出为svc.yaml提供后面,这里就直接用命令行创建了
# kubectl expose deployment nginx --port=80 --target-port=80
service/nginx exposed

# kubectl get svc
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1      <none>        443/TCP   4d23h
nginx        ClusterIP   10.68.18.121   <none>        80/TCP    5s

# 看下自动关联生成的endpoint
# kubectl  get endpoints nginx 
NAME    ENDPOINTS                       AGE
nginx   172.20.139.72:80,172.20.217.72:80   27s

# 接下来测试下svc的负载均衡效果吧,这里我们先进到pod里面,把nginx的页面信息改为各自pod的hostname
# kubectl  exec -it nginx-6799fc88d8-2kgn8 -- bash
root@nginx-f89759699-bzwd2:/# echo nginx-6799fc88d8-2kgn8 > /usr/share/nginx/html/index.html
root@nginx-f89759699-bzwd2:/# exit
# kubectl  exec -it nginx-6799fc88d8-gn7r7 -- bash
root@nginx-f89759699-qlc8q:/# echo nginx-6799fc88d8-gn7r7  > /usr/share/nginx/html/index.html
root@nginx-f89759699-qlc8q:/# exit
# kubectl  exec -it nginx-6799fc88d8-npm5g -- bash
root@nginx-f89759699-qlc8q:/# echo nginx-6799fc88d8-npm5g  > /usr/share/nginx/html/index.html
root@nginx-f89759699-qlc8q:/# exit

# curl 10.68.18.121
nginx-f89759699-bzwd2
# curl 10.68.18.121
nginx-f89759699-qlc8q


# 修改svc的类型来提供外部访问
# kubectl patch svc nginx -p '{"spec":{"type":"NodePort"}}'
service/nginx patched

# kubectl  get svc
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.68.0.1     <none>        443/TCP        3d21h
nginx        NodePort    10.68.86.85   <none>        80:33184/TCP   30m

# 具体看下pod是不是分散运行在不同的node上呢
# kubectl get pod -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP            NODE         NOMINATED NODE   READINESS GATES
nginx-f89759699-26fzd   1/1     Running   0          45m   172.20.0.16   10.0.1.202   <none>           <none>
nginx-f89759699-9s4dw   1/1     Running   0          43m   172.20.1.14   10.0.1.201   <none>           <none>
# node + port
# node是kubectl get pod 中对应的node的ip
# port是kubectl get svc 中查到的对应服务的外部端口
[root@node-2 ~]# curl 10.0.1.201:20651
nginx-f89759699-bzwd2
[root@node-2 ~]# curl 10.0.1.201:20651
nginx-f89759699-qlc8q

我们这里也来分析下这个svc的yaml配置

cat svc.yaml

apiVersion: v1       # <<<<<<  v1 是 Service 的 apiVersion
kind: Service        # <<<<<<  指明当前资源的类型为 Service
metadata:
  creationTimestamp: null
  labels:
    app: nginx
  name: nginx       # <<<<<<  Service 的名字为 nginx
spec:
  ports:
  - port: 80        # <<<<<<  将 Service 的 80 端口映射到 Pod 的 80 端口,使用 TCP 协议
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx     # <<<<<<  selector 指明挑选那些 label 为 run: nginx 的 Pod 作为 Service 的后端
status:
  loadBalancer: {}

我们来看下这个nginx的svc描述

# kubectl describe svc nginx 
Name:                     nginx
Namespace:                default
Labels:                   app=nginx
Annotations:              <none>
Selector:                 app=nginx
Type:                     NodePort
IP:                       10.68.18.121
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  20651/TCP
Endpoints:                172.20.139.72:80,172.20.217.72:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

我们可以看到在Endpoints列出了2个pod的IP和端口,pod的ip是在容器中配置的,那么这里Service cluster IP又是在哪里配置的呢?cluster ip又是自律映射到pod ip上的呢?

# 首先看下kube-proxy的配置
# cat /etc/systemd/system/kube-proxy.service    
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/opt/kube/bin/kube-proxy \
  --bind-address=10.0.1.202 \
  --cluster-cidr=172.20.0.0/16 \
  --hostname-override=10.0.1.202 \
  --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig \
  --logtostderr=true \
  --proxy-mode=ipvs        #<-------  我们在最开始部署kube-proxy的时候就设定它的转发模式为ipvs,因为默认的iptables在存在大量svc的情况下性能很低
Restart=always
RestartSec=5
LimitNOFILE=65536

# 看下本地网卡,会有一个ipvs的虚拟网卡
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:20:b8:39 brd fe:fe:fe:fe:fe:ff
    inet 10.0.1.202/24 brd 10.0.1.255 scope global noprefixroute ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29fe:fe20:b839/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:91:ac:ce:13 brd fe:fe:fe:fe:fe:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 22:50:98:a6:f9:e4 brd fe:fe:fe:fe:fe:ff
5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 96:6b:f0:25:1a:26 brd fe:fe:fe:fe:fe:ff
    inet 10.68.0.2/32 brd 10.68.0.2 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.0.1/32 brd 10.68.0.1 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.120.201/32 brd 10.68.120.201 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.50.42/32 brd 10.68.50.42 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.68.18.121/32 brd 10.68.18.121 scope global kube-ipvs0    # <-------- SVC的IP配置在这里
       valid_lft forever preferred_lft forever
6: caliaeb0378f7a4@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd fe:fe:fe:fe:fe:ff link-netnsid 0
    inet6 fe80::ecee:eefe:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
7: tunl0@NONe: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 172.20.247.0/32 brd 172.20.247.0 scope global tunl0
       valid_lft forever preferred_lft forever
       
       
       


# 来看下lvs的虚拟服务器列表
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.17.0.1:20651 rr
  -> 172.20.139.72:80             Masq    1      0          0         
  -> 172.20.217.72:80             Masq    1      0          0         
TCP  172.20.247.0:20651 rr
  -> 172.20.139.72:80             Masq    1      0          0         
  -> 172.20.217.72:80             Masq    1      0          0         
TCP  10.0.1.202:20651 rr
  -> 172.20.139.72:80             Masq    1      0          0         
  -> 172.20.217.72:80             Masq    1      0          0         
TCP  10.68.0.1:443 rr
  -> 10.0.1.201:6443              Masq    1      0          0         
  -> 10.0.1.202:6443              Masq    1      3          0         
TCP  10.68.0.2:53 rr
  -> 172.20.247.2:53              Masq    1      0          0         
TCP  10.68.0.2:9153 rr
  -> 172.20.247.2:9153            Masq    1      0          0         
TCP  10.68.18.121:80 rr                                        #<-----------   SVC转发Pod的明细在这里
  -> 172.20.139.72:80             Masq    1      0          0         
  -> 172.20.217.72:80             Masq    1      0          0         
TCP  10.68.50.42:443 rr
  -> 172.20.217.71:4443           Masq    1      0          0         
TCP  10.68.120.201:80 rr
  -> 10.0.1.201:80                Masq    1      0          0         
  -> 10.0.1.202:80                Masq    1      0          0         
TCP  10.68.120.201:443 rr
  -> 10.0.1.201:443               Masq    1      0          0         
  -> 10.0.1.202:443               Masq    1      0          0         
TCP  10.68.120.201:10254 rr
  -> 10.0.1.201:10254             Masq    1      0          0         
  -> 10.0.1.202:10254             Masq    1      0          0         
TCP  127.0.0.1:20651 rr
  -> 172.20.139.72:80             Masq    1      0          0         
  -> 172.20.217.72:80             Masq    1      0          0         
UDP  10.68.0.2:53 rr
  -> 172.20.247.2:53              Masq    1      0          0 

除了直接用cluster ip,以及上面说到的NodePort模式来访问Service,我们还可以用K8s的DNS来访问

# 我们前面装好的CoreDNS,来提供K8s集群的内部DNS访问
# kubectl -n kube-system get deployment,pod|grep dns
deployment.apps/coredns                   1/1     1            1           5d2h
pod/coredns-d9b6857b5-tt7j2                    1/1     Running   1          27h

# coredns是一个DNS服务器,每当有新的Service被创建的时候,coredns就会添加该Service的DNS记录,然后我们通过serviceName.namespaceName就可以来访问到对应的pod了,下面来演示下:

# kubectl run -it --rm busybox --image=busybox -- sh    # --rm代表等我退出这个pod后,它会被自动删除,当作一个临时pod在用
If you don't see a command prompt, try pressing enter.
/ # ping nginx.default
PING nginx.default (10.68.18.121): 56 data bytes
64 bytes from 10.68.18.121: seq=0 ttl=64 time=0.096 ms
64 bytes from 10.68.18.121: seq=1 ttl=64 time=0.067 ms
64 bytes from 10.68.18.121: seq=2 ttl=64 time=0.065 ms
^C
--- nginx.default ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.065/0.076/0.096 ms
/ # wget nginx.default
Connecting to nginx.default (10.68.18.121:80)
saving to 'index.html'
index.html           100% |********************************************************************|    22  0:00:00 ETA
'index.html' saved
/ # cat index.html 
nginx-f89759699-bzwd2

service生产小技巧 通过svc来访问非K8s上的服务

上面我们提到了创建service后,会自动创建对应的endpoint,这里面的关键在于 selector: app: nginx 基于lables标签选择了一组存在这个标签的pod,然而在我们创建svc时,如果没有定义这个selector,那么系统是不会自动创建endpoint的,我们可不可以手动来创建这个endpoint呢?答案是可以的,在生产中,我们可以通过创建不带selector的Service,然后创建同样名称的endpoint,来关联K8s集群以外的服务,这个具体能带给我们运维人员什么好处呢,就是我们可以直接复用K8s上的ingress(这个后面会讲到,现在我们就当它是一个nginx代理),来访问K8s集群以外的服务,省去了自己搭建前面Nginx代理服务器的麻烦

开始实践测试

这里我们挑选node-2节点,用python运行一个简易web服务器

[root@node-2 mnt]# python -m SimpleHTTPServer 9999
Serving HTTP on 0.0.0.0 port 9999 ...

然后我们用之前学会的方法,来生成svc和endpoint的yaml配置,并修改成如下内容,并保存为mysvc.yaml

注意Service和Endpoints的名称必须一致

# 注意我这里把两个资源的yaml写在一个文件内,在实际生产中,我们经常会这么做,方便对一个服务的所有资源进行统一管理,不同资源之间用"---"来分隔
apiVersion: v1
kind: Service
metadata:
  name: mysvc
  namespace: default
spec:
  type: ClusterIP
  ports:
  - port: 80
    protocol: TCP

---

apiVersion: v1
kind: Endpoints
metadata:
  name: mysvc
  namespace: default
subsets:
- addresses:
  - ip: 10.0.1.202
    nodeName: 10.0.1.202
  ports:
  - port: 9999
    protocol: TCP

开始创建并测试

# kubectl  apply -f mysvc.yaml 
service/mysvc created
endpoints/mysvc created

# kubectl get svc,endpoints |grep mysvc
service/mysvc        ClusterIP   10.68.71.166   <none>        80/TCP         14s
endpoints/mysvc        10.0.1.202:9999                     14s

# curl 10.68.71.166
mysvc

# 我们回到node-2节点上,可以看到有一条刚才的访问日志打印出来了
10.0.1.201 - - [25/Nov/2020 14:42:45] "GET / HTTP/1.1" 200 -

外部网络如何访问到Service呢?

在上面其实已经给大家演示过了将Service的类型改为NodePort,然后就可以用node节点的IP加端口就能访问到Service了,我们这里来详细分析下原理,以便加深印象

# 我们看下先创建的nginx service的yaml配置
# kubectl get svc nginx -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2020-11-25T03:55:05Z"
  labels:
    app: nginx
  managedFields:       # 在新版的K8s运行的资源配置里面,会输出这么多的配置信息,这里我们可以不用管它,实际我们在创建时,这些都是忽略的
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      e:metadata:
        e:labels:
          .: {}
          e:app: {}
      e:spec:
        e:externalTrafficPolicy: {}
        e:ports:
          .: {}
          k:{"port":80,"protocol":"TCP"}:
            .: {}
            e:port: {}
            e:protocol: {}
            e:targetPort: {}
        e:selector:
          .: {}
          e:app: {}
        e:sessionAffinity: {}
        e:type: {}
    manager: kubectl
    operation: Update
    time: "2020-11-25T04:00:28Z"
  name: nginx
  namespace: default
  resourceVersion: "591029"
  selfLink: /api/v1/namespaces/default/services/nginx
  uid: 84fea557-e19d-486d-b879-13743c603091
spec:
  clusterIP: 10.68.18.121
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 20651     # 我们看下这里,它定义的一个nodePort配置,并分配了20651端口,因为我们先前创建时并没有指定这个配置,所以它是随机生成的
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}
  
  
# 我们看下apiserver的配置
# cat /etc/systemd/system/kube-apiserver.service 
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
ExecStart=/opt/kube/bin/kube-apiserver \
  --advertise-address=10.0.1.201 \
  --allow-privileged=true \
  --anonymous-auth=false \
  --authorization-mode=Node,RBAC \
  --token-auth-file=/etc/kubernetes/ssl/basic-auth.csv \
  --bind-address=10.0.1.201 \
  --client-ca-file=/etc/kubernetes/ssl/ca.pem \
  --endpoint-reconciler-type=lease \
  --etcd-cafile=/etc/kubernetes/ssl/ca.pem \
  --etcd-certfile=/etc/kubernetes/ssl/kubernetes.pem \
  --etcd-keyfile=/etc/kubernetes/ssl/kubernetes-key.pem \
  --etcd-servers=https://10.0.1.201:2379,https://10.0.1.202:2379,https://10.0.1.203:2379 \
  --kubelet-certificate-authority=/etc/kubernetes/ssl/ca.pem \
  --kubelet-client-certificate=/etc/kubernetes/ssl/myk8s.pem \
  --kubelet-client-key=/etc/kubernetes/ssl/myk8s-key.pem \
  --kubelet-https=true \
  --service-account-key-file=/etc/kubernetes/ssl/ca.pem \
  --service-cluster-ip-range=10.68.0.0/16 \
  --service-node-port-range=20000-40000 \      # 这就是NodePor随机生成端口的范围,这个在我们部署时就指定了
  --tls-cert-file=/etc/kubernetes/ssl/kubernetes.pem \
  --tls-private-key-file=/etc/kubernetes/ssl/kubernetes-key.pem \
  --requestheader-client-ca-file=/etc/kubernetes/ssl/ca.pem \
  --requestheader-allowed-names= \
  --requestheader-extra-headers-prefix=X-Remote-Extra- \
  --requestheader-group-headers=X-Remote-Group \
  --requestheader-username-headers=X-Remote-User \
  --proxy-client-cert-file=/etc/kubernetes/ssl/aggregator-proxy.pem \
  --proxy-client-key-file=/etc/kubernetes/ssl/aggregator-proxy-key.pem \
  --enable-aggregator-routing=true \
  --v=2
Restart=always
RestartSec=5
Type=notify
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target


# NodePort端口会在所在K8s的node节点上都生成一个同样的端口,这就使我们无论所以哪个node的ip接端口都能方便的访问到Service了,但在实际生产中,这个NodePort不建议经常使用,因为它会造成node上端口管理混乱,等用到了ingress后,你就不会想使用NodePort模式了,这个接下来会讲到

[root@node-1 ~]# ipvsadm -ln|grep -C6 20651
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
......      
TCP  10.0.1.201:20651 rr      # 这里
  -> 172.20.139.72:80             Masq    1      0          0         
  -> 172.20.217.72:80             Masq    1      0          0    


[root@node-2 mnt]# ipvsadm -ln|grep -C6 20651
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
......       
TCP  10.0.1.202:20651 rr      # 这里
  -> 172.20.139.72:80             Masq    1      0          0         
  -> 172.20.217.72:80             Masq    1      0          0  

生产中Service的调优

# 先把nginx的pod数量调整为1,方便呆会观察
# kubectl scale deployment nginx --replicas=1
deployment.apps/nginx scaled

# 看下这个nginx的pod运行情况,-o wide显示更详细的信息,这里可以看到这个pod运行在node 203上面
# kubectl  get pod -o wide                     
NAME                    READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
nginx-f89759699-qlc8q   1/1     Running   0          3h27m   172.20.139.72   10.0.1.203   <none>           <none>

# 我们先直接通过pod运行的node的IP来访问测试
[root@node-1 ~]# curl 10.0.1.203:20651
nginx-f89759699-qlc8q

# 可以看到日志显示这条请求的来源IP是203,而不是node-1的IP 10.0.1.201
# 注: kubectl logs --tail=1 代表查看这个pod的日志,并只显示倒数第一条
[root@node-1 ~]# kubectl logs --tail=1 nginx-f89759699-qlc8q 
10.0.1.203 - - [25/Nov/2020:07:22:54 +0000] "GET / HTTP/1.1" 200 22 "-" "curl/7.29.0" "-"

# 再来通过201来访问
[root@node-1 ~]# curl 10.0.1.201:20651          
nginx-f89759699-qlc8q

# 可以看到显示的来源IP非node节点的
[root@node-1 ~]# kubectl logs --tail=1 nginx-f89759699-qlc8q 
172.20.84.128 - - [25/Nov/2020:07:23:18 +0000] "GET / HTTP/1.1" 200 22 "-" "curl/7.29.0" "-"

# 这就是一个虚拟网卡转发的
[root@node-1 ~]# ip a|grep -wC2 172.20.84.128 
9: tunl0@NONe: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 172.20.84.128/32 brd 172.20.84.128 scope global tunl0
       valid_lft forever preferred_lft forever

# 可以看下lvs的虚拟服务器列表,正好是转到我们要访问的pod上的
[root@node-1 ~]# ipvsadm -ln|grep -A1 172.20.84.128 
TCP  172.20.84.128:20651 rr
  -> 172.20.139.72:80             Masq    1      0          0      
  

详细处理流程如下:

* 客户端发送数据包 10.0.1.201:20651
* 10.0.1.201 用自己的IP地址替换数据包中的源IP地址(SNAT)
* 10.0.1.201 使用 pod IP 替换数据包上的目标 IP
* 数据包路由到 10.0.1.203 ,然后路由到 endpoint 
* pod的回复被路由回 10.0.1.201 
* pod的回复被发送回客户端

                   client
                      \ ^
                       \ \
                        v \
10.0.1.203 <--- 10.0.1.201
    | ^        SNAT
    | |           --->
    v |
 endpoint   

为避免这种情况, Kubernetes 具有保留客户端IP 的功能。设置 service.spec.externalTrafficPolicy 为 Local 会将请求代理到本地端点,不将流量转发到其他节点,从而保留原始IP地址。如果没有本地端点,则丢弃发送到节点的数据包,因此您可以在任何数据包处理规则中依赖正确的客户端IP。

# 设置 service.spec.externalTrafficPolicy 字段如下:
# kubectl patch svc nginx -p '{"spec":{"externalTrafficPolicy":"Local"}}'
service/nginx patched

# 现在通过非pod所在node节点的IP来访问是不通了
[root@node-1 ~]# curl 10.0.1.201:20651                       
curl: (7) Failed connect to 10.0.1.201:20651; Connection refused

# 通过所在node的IP发起请求正常
[root@node-1 ~]# curl 10.0.1.203:20651
nginx-f89759699-qlc8q

# 可以看到日志显示的来源IP就是201,这才是我们想要的结果
[root@node-1 ~]# kubectl logs --tail=1 nginx-f89759699-qlc8q          
10.0.1.201 - - [25/Nov/2020:07:33:42 +0000] "GET / HTTP/1.1" 200 22 "-" "curl/7.29.0" "-"

# 去掉这个优化配置也很简单
# kubectl patch svc nginx -p '{"spec":{"externalTrafficPolicy":""}}' 

第5关 k8s架构师课程攻克作战攻略之五 - labels

大家好,今天给大家讲讲k8s里面的labels标签。

Labels

labels标签,在kubernetes我们会经常见到,它的功能非常关键,就相关于服务pod的身份证信息,如果我们创建一个deployment资源,它之所以能守护下面启动的N个pod以达到期望的数据,service之所以能把流量准确无误的转发到指定的pod上去,归根结底都是labels在这里起作用,下面我们来实际操作下,相信大家跟着操作完成后,就会理解labels的功效了

# 我们先来创建一个nginx的deployment资源
kubectl create deployment nginx --image=nginx --replicas=3

# 等服务pod都运行好,这时候按我们期待的状态就是3个pod,没问题
kubectl get pod -w

# 我们现在来修改其中一个pod的label,你会发现这个pod会被deployment抛弃,因为失去了labels这个标签,deployment已经不认识这个pod了,它就成了无主的pod,这时我们直接删除这个pod,它就会直接消失,就和我们用kubectl run 一个独立的pod资源一样


# 我们再来基于这个nginx的deployment来创建一个service服务
kubectl expose deployment nginx --port=80 --target-port=80 --name=nginx

# 直接利用svc的ip来请求下,发现都是正常的对吧
kubectl get svc nginx

# 这个时候我来来修改下svc资源的选择labels,看看会出现什么情况
kubectl patch services nginx -p '{"spec":{"selector":{"app": "nginxaaa"}}}'
# 这时再请求这个svc的ip,你会发现已经请求不通了,这也证明了它已经关联不到后面对应label的pod了

# 我们修改回来后,会发现一切恢复正常了
kubectl patch services nginx -p '{"spec":{"selector":{"app": "nginx"}}}'

labels受namespace管控,在同一个namespace下面的服务labels,如果只有一个,就需要注意其唯一性,不要有重复的存在,不然服务就会跑串,出现一些奇怪的现象,我们在资源中可以配置多个lables来一起组合使用,这样就会大大降低重复的情况了。

第6关 k8s架构师课程之流量入口Ingress上部

大家好,这节课带来k8s的流量入口ingress,作为业务对外服务的公网入口,它的重要性不言而喻,大家一定要仔细阅读,跟着博哥的教程一步步实操去理解。

这节课所用到的yaml配置比较多,但我发现在头条这里发的格式会有问题,所以我另外把笔记文字部分存了一份在我的github上面,大家可以从这里面来复制yaml配置创建服务:

https://github.com/bogeit/LearnK8s/blob/main/%E7%AC%AC6%E5%85%B3%20k8s%E6%9E%B6%E6%9E%84%E5%B8%88%E8%AF%BE%E7%A8%8B%E4%B9%8B%E6%B5%81%E9%87%8F%E5%85%A5%E5%8F%A3Ingress%E4%B8%8A%E9%83%A8.md

我们上面学习了通过Service服务来访问pod资源,另外通过修改Service的类型为NodePort,然后通过一些手段作公网IP的端口映射来提供K8s集群外的访问,但这并不是一种很优雅的方式。

通常,services和Pod只能通过集群内网络访问。 所有在边界路由器上的流量都被丢弃或转发到别处。 
从概念上讲,这可能看起来像:

    internet
        |
  ------------
  [ Services ]

另外可以我们通过LoadBalancer负载均衡来提供外部流量的的访问,但这种模式对于实际生产来说,用起来不是很方便,而且用这种模式就意味着每个服务都需要有自己的的负载均衡器以及独立的公有IP。

我们这是用Ingress,因为Ingress只需要一个公网IP就能为K8s上所有的服务提供访问,Ingress工作在7层(HTTP),Ingress会根据请求的主机名以及路径来决定把请求转发到相应的服务,如下图所示:

第6关 k8s架构师课程之流量入口Ingress上部

Ingress是允许入站连接到达集群服务的一组规则。即介于物理网络和群集svc之间的一组转发规则。 
其实就是实现L4 L7的负载均衡:
注意:这里的Ingress并非将外部流量通过Service来转发到服务pod上,而只是通过Service来找到对应的Endpoint来发现pod进行转发

   
    internet
        |
   [ Ingress ]   ---> [ Services ] ---> [ Endpoint ]
   --|-----|--                                 |
   [ Pod,pod,...... ]<-------------------------|

要在K8s上面使用Ingress,我们就需要在K8s上部署Ingress-controller控制器,只有它在K8s集群中运行,Ingress依次才能正常工作。Ingress-controller控制器有很多种,比如traefik,但我们这里要用到ingress-nginx这个控制器,它的底层就是用Openresty融合nginx和一些lua规则等实现的。

重点来了,我在讲课中一直强调,本课程带给大家的都是基于生产中实战经验,所以这里我们用的ingress-nginx不是普通的社区版本,而是经过了超大生产流量检验,国内最大的云平台阿里云基于社区版分支出来,进行了魔改而成,更符合生产,基本属于开箱即用,下面是aliyun-ingress-controller的介绍:

下面介绍只截取了最新的一部分,更多文档资源可以查阅官档:
https://developer.aliyun.com/article/598075

服务简介
在Kubernetes集群中,Ingress是授权入站连接到达集群服务的规则集合,为您提供七层负载均衡能力,您可以通过 Ingress 配置提供外部可访问的 URL、负载均衡、SSL、基于名称的虚拟主机,阿里云容器服务K8S Ingress Controller在完全兼容社区版本的基础上提供了更多的特性和优化。

版本说明
v0.30.0.2-9597b3685-aliyun:
新增FastCGI Backend支持
默认启用Dynamic SSL Cert Update模式
新增流量Mirror配置支持
升级NGINX版本到1.17.8,OpenResty版本到1.15.8,更新基础镜像为Alpine
新增Ingress Validating Webhook支持
修复CVE-2018-16843、CVE-2018-16844、CVE-2019-9511、CVE-2019-9513和CVE-2019-9516漏洞
[Breaking Change] lua-resty-waf、session-cookie-hash、force-namespace-isolation等配置被废弃;x-forwarded-prefix类型从boolean转成string类型;log-format配置中的the_real_ip变量下个版本将被废弃,统一采用remote_addr替代
同步更新到社区0.30.0版本,更多详细变更记录参考社区Changelog

aliyun-ingress-controller有一个很重要的修改,就是它支持路由配置的动态更新,大家用过Nginx的可以知道,在修改完Nginx的配置,我们是需要进行nginx -s reload来重加载配置才能生效的,在K8s上,这个行为也是一样的,但由于K8s运行的服务会非常多,所以它的配置更新是非常频繁的,因此,如果不支持配置动态更新,对于在高频率变化的场景下,Nginx频繁Reload会带来较明显的请求访问问题:

  1. 造成一定的QPS抖动和访问失败情况
  2. 对于长连接服务会被频繁断掉
  3. 造成大量的处于shutting down的Nginx Worker进程,进而引起内存膨胀

详细原理分析见这篇文章: https://developer.aliyun.com/article/692732

我们准备来部署aliyun-ingress-controller,下面直接是生产中在用的yaml配置,我们保存了aliyun-ingress-nginx.yaml准备开始部署:

详细讲解下面yaml配置的每个部分

apiVersion: v1
kind: Namespace
metadata:
  name: ingress-nginx
  labels:
    app: ingress-nginx

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nginx-ingress-controller
  namespace: ingress-nginx
  labels:
    app: ingress-nginx

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: nginx-ingress-controller
  labels:
    app: ingress-nginx
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - endpoints
      - nodes
      - pods
      - secrets
      - namespaces
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "extensions"
      - "networking.k8s.io"
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - create
      - patch
  - apiGroups:
      - "extensions"
      - "networking.k8s.io"
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - create
  - apiGroups:
      - ""
    resources:
      - configmaps
    resourceNames:
      - "ingress-controller-leader-nginx"
    verbs:
      - get
      - update

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: nginx-ingress-controller
  labels:
    app: ingress-nginx
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nginx-ingress-controller
subjects:
  - kind: ServiceAccount
    name: nginx-ingress-controller
    namespace: ingress-nginx

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: ingress-nginx
  name: nginx-ingress-lb
  namespace: ingress-nginx
spec:
  # DaemonSet need:
  # ----------------
  type: ClusterIP
  # ----------------
  # Deployment need:
  # ----------------
#  type: NodePort
  # ----------------
  ports:
  - name: http
    port: 80
    targetPort: 80
    protocol: TCP
  - name: https
    port: 443
    targetPort: 443
    protocol: TCP
  - name: metrics
    port: 10254
    protocol: TCP
    targetPort: 10254
  selector:
    app: ingress-nginx

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
  labels:
    app: ingress-nginx
data:
  keep-alive: "75"
  keep-alive-requests: "100"
  upstream-keepalive-connections: "10000"
  upstream-keepalive-requests: "100"
  upstream-keepalive-timeout: "60"
  allow-backend-server-header: "true"
  enable-underscores-in-headers: "true"
  generate-request-id: "true"
  http-redirect-code: "301"
  ignore-invalid-headers: "true"
  log-format-upstream: '{"@timestamp": "$time_iso8601","remote_addr": "$remote_addr","x-forward-for": "$proxy_add_x_forwarded_for","request_id": "$req_id","remote_user": "$remote_user","bytes_sent": $bytes_sent,"request_time": $request_time,"status": $status,"vhost": "$host","request_proto": "$server_protocol","path": "$uri","request_query": "$args","request_length": $request_length,"duration": $request_time,"method": "$request_method","http_referrer": "$http_referer","http_user_agent":  "$http_user_agent","upstream-sever":"$proxy_upstream_name","proxy_alternative_upstream_name":"$proxy_alternative_upstream_name","upstream_addr":"$upstream_addr","upstream_response_length":$upstream_response_length,"upstream_response_time":$upstream_response_time,"upstream_status":$upstream_status}'
  max-worker-connections: "65536"
  worker-processes: "2"
  proxy-body-size: 20m
  proxy-connect-timeout: "10"
  proxy_next_upstream: error timeout http_502
  reuse-port: "true"
  server-tokens: "false"
  ssl-ciphers: ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
  ssl-protocols: TLSv1 TLSv1.1 TLSv1.2
  ssl-redirect: "false"
  worker-cpu-affinity: auto

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: tcp-services
  namespace: ingress-nginx
  labels:
    app: ingress-nginx

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: udp-services
  namespace: ingress-nginx
  labels:
    app: ingress-nginx

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nginx-ingress-controller
  namespace: ingress-nginx
  labels:
    app: ingress-nginx
  annotations:
    component.version: "v0.30.0"
    component.revision: "v1"
spec:
  # Deployment need:
  # ----------------
#  replicas: 1
  # ----------------
  selector:
    matchLabels:
      app: ingress-nginx
  template:
    metadata:
      labels:
        app: ingress-nginx
      annotations:
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
        scheduler.alpha.kubernetes.io/critical-pod: ""
    spec:
      # DaemonSet need:
      # ----------------
      hostNetwork: true
      # ----------------
      serviceAccountName: nginx-ingress-controller
      priorityClassName: system-node-critical
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - ingress-nginx
              topologyKey: kubernetes.io/hostname
            weight: 100
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: type
                operator: NotIn
                values:
                - virtual-kubelet
      containers:
        - name: nginx-ingress-controller
          image: registry.cn-beijing.aliyuncs.com/acs/aliyun-ingress-controller:v0.30.0.2-9597b3685-aliyun
          args:
            - /nginx-ingress-controller
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            - --publish-service=$(POD_NAMESPACE)/nginx-ingress-lb
            - --annotations-prefix=nginx.ingress.kubernetes.io
            - --enable-dynamic-certificates=true
            - --v=2
          securityContext:
            allowPrivilegeEscalation: true
            capabilities:
              drop:
                - ALL
              add:
                - NET_BIND_SERVICE
            runAsUser: 101
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          ports:
            - name: http
              containerPort: 80
            - name: https
              containerPort: 443
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 10
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 10
#          resources:
#            limits:
#              cpu: "1"
#              memory: 2Gi
#            requests:
#              cpu: "1"
#              memory: 2Gi
          volumeMounts:
          - mountPath: /etc/localtime
            name: localtime
            readOnly: true
      volumes:
      - name: localtime
        hostPath:
          path: /etc/localtime
          type: File
      nodeSelector:
        boge/ingress-controller-ready: "true"
      tolerations:
      - operator: Exists

      initContainers:
      - command:
        - /bin/sh
        - -c
        - |
          mount -o remount rw /proc/sys
          sysctl -w net.core.somaxconn=65535
          sysctl -w net.ipv4.ip_local_port_range="1024 65535"
          sysctl -w fs.file-max=1048576
          sysctl -w fs.inotify.max_user_instances=16384
          sysctl -w fs.inotify.max_user_watches=524288
          sysctl -w fs.inotify.max_queued_events=16384
        image: registry.cn-beijing.aliyuncs.com/acs/busybox:v1.29.2
        imagePullPolicy: Always
        name: init-sysctl
        securityContext:
          privileged: true
          procMount: Default

---
## Deployment need for aliyun'k8s:
#apiVersion: v1
#kind: Service
#metadata:
#  annotations:
#    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id: "lb-xxxxxxxxxxxxxxxxxxx"
#    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-force-override-listeners: "true"
#  labels:
#    app: nginx-ingress-lb
#  name: nginx-ingress-lb-local
#  namespace: ingress-nginx
#spec:
#  externalTrafficPolicy: Local
#  ports:
#  - name: http
#    port: 80
#    protocol: TCP
#    targetPort: 80
#  - name: https
#    port: 443
#    protocol: TCP
#    targetPort: 443
#  selector:
#    app: ingress-nginx
#  type: LoadBalancer

DaemonSet

开始部署:

# kubectl  apply -f aliyun-ingress-nginx.yaml 
namespace/ingress-nginx created
serviceaccount/nginx-ingress-controller created
clusterrole.rbac.authorization.k8s.io/nginx-ingress-controller created
clusterrolebinding.rbac.authorization.k8s.io/nginx-ingress-controller created
service/nginx-ingress-lb created
configmap/nginx-configuration created
configmap/tcp-services created
configmap/udp-services created
daemonset.apps/nginx-ingress-controller created

# 这里是以daemonset资源的形式进行的安装
# DaemonSet资源和Deployment的yaml配置类似,但不同的是Deployment可以在每个node上运行多个pod副本,但daemonset在每个node上只能运行一个pod副本
# 这里正好就借运行ingress-nginx的情况下,把daemonset这个资源做下讲解

# 我们查看下pod,会发现空空如也,为什么会这样呢?
# kubectl -n ingress-nginx get pod
注意上面的yaml配置里面,我使用了节点选择配置,只有打了我指定lable标签的node节点,也会被允许调度pod上去运行
      nodeSelector:
        boge/ingress-controller-ready: "true"

# 我们现在来打标签
# kubectl label node 10.0.1.201 boge/ingress-controller-ready=true
# kubectl label node 192.168.100.50 boge/ingress-controller-ready=true
node/10.0.1.201 labeled
# kubectl label node 10.0.1.202 boge/ingress-controller-ready=true
kubectl label node 192.168.100.60 boge/ingress-controller-ready=true
node/10.0.1.202 labeled

# 接着可以看到pod就被调试到这两台node上启动了
# kubectl -n ingress-nginx get pod -o wide
NAME                             READY   STATUS    RESTARTS   AGE    IP           NODE         NOMINATED NODE   READINESS GATES
nginx-ingress-controller-lchgr   1/1     Running   0          9m1s   10.0.1.202   10.0.1.202   <none>           <none>
nginx-ingress-controller-x87rp   1/1     Running   0          9m6s   10.0.1.201   10.0.1.201   <none>           <none>

我们基于前面学到的deployment和service,来创建一个nginx的相应服务资源,保存为nginx.yaml:

注意:记得把前面测试的资源删除掉,以防冲突

apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx  # Deployment 如何跟 Service 关联的????
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx

运行它:

# kubectl apply -f nginx.yaml 
service/nginx created
deployment.apps/nginx created

然后准备nginx的ingress配置,保留为nginx-ingress.yaml,并执行它:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nginx-ingress
spec:
  rules:
    - host: nginx.boge.com
      http:
        paths:
          - backend:
              serviceName: nginx   # 与service做关联????
              servicePort: 80
            path: /
# kubectl apply -f nginx-ingress.yaml 
ingress.extensions/nginx-ingress created

#查看创建的ingress资源
# kubectl get ingress
NAME            CLASS    HOSTS            ADDRESS   PORTS   AGE
nginx-ingress   <none>   nginx.boge.com             80      13s

# 我们在其它节点上,加下本地hosts,来测试下效果
10.0.1.201 nginx.boge.com

# 可以看到请求成功了
[root@node-2 ~]# curl nginx.boge.com
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

# 回到201节点上,看下ingress-nginx的日志
[root@node-1 ~]# kubectl -n ingress-nginx logs --tail=1 nginx-ingress-controller-x87rp  
{"@timestamp": "2020-11-26T18:16:53+08:00","remote_addr": "10.0.1.202","x-forward-for": "10.0.1.202","request_id": "74440b6d92aca4d64600ffa85c1dee15","remote_user": "-","bytes_sent": 851,"request_time": 0.004,"status": 200,"vhost": "nginx.boge.com","request_proto": "HTTP/1.1","path": "/","request_query": "-","request_length": 78,"duration": 0.004,"method": "GET","http_referrer": "-","http_user_agent":  "curl/7.29.0","upstream-sever":"default-nginx-80","proxy_alternative_upstream_name":"","upstream_addr":"172.20.217.65:80","upstream_response_length":612,"upstream_response_time":0.003,"upstream_status":200}

在生产环境中,如果是自建机房,我们通常会在至少2台node节点上运行有ingress-nginx的pod,那么有必要在这两台node上面部署负载均衡软件做调度,来起到高可用的作用,这里我们用haproxy+keepalived,如果你的生产环境是在云上,假设是阿里云,那么你只需要购买一个负载均衡器SLB,将运行有ingress-nginx的pod的节点服务器加到这个SLB的后端来,然后将请求域名和这个SLB的公网IP做好解析即可,目前我们用二进制部署的K8s集群通信架构如下:

第6关 k8s架构师课程之流量入口Ingress上部

注意在每台node节点上有已经部署有了个haproxy软件,来转发apiserver的请求的,那么,我们只需要选取两台节点,部署keepalived软件并重新配置haproxy,来生成VIP达到ha的效果,这里我们选择在其中两台node节点(10.0.1.202、203)上部署

node上现在已有的haproxy配置:

# cat /etc/haproxy/haproxy.cfg 
global
        log /dev/log    local1 warning
        chroot /var/lib/haproxy
        user haproxy
        group haproxy
        daemon
        nbproc 1

defaults
        log     global
        timeout connect 5s
        timeout client  10m
        timeout server  10m

listen kube-master
        bind 127.0.0.1:6443
        mode tcp
        option tcplog
        option dontlognull
        option dontlog-normal
        balance roundrobin 
        server 10.0.1.201 10.0.1.201:6443 check inter 10s fall 2 rise 2 weight 1
        server 10.0.1.202 10.0.1.202:6443 check inter 10s fall 2 rise 2 weight 1

开始新增ingress端口的转发,修改haproxy配置如下(注意两台节点都记得修改):

# cat /etc/haproxy/haproxy.cfg 
global
        log /dev/log    local1 warning
        chroot /var/lib/haproxy
        user haproxy
        group haproxy
        daemon
        nbproc 1

defaults
        log     global
        timeout connect 5s
        timeout client  10m
        timeout server  10m

listen kube-master
        bind 127.0.0.1:6443
        mode tcp
        option tcplog
        option dontlognull
        option dontlog-normal
        balance roundrobin 
        server 10.0.1.201 10.0.1.201:6443 check inter 10s fall 2 rise 2 weight 1
        server 10.0.1.202 10.0.1.202:6443 check inter 10s fall 2 rise 2 weight 1

listen ingress-http
        bind 0.0.0.0:80
        mode tcp
        option tcplog
        option dontlognull
        option dontlog-normal
        balance roundrobin
        server 10.0.1.201 10.0.1.201:80 check inter 2000 fall 2 rise 2 weight 1
        server 10.0.1.202 10.0.1.202:80 check inter 2000 fall 2 rise 2 weight 1

listen ingress-https
        bind 0.0.0.0:443
        mode tcp
        option tcplog
        option dontlognull
        option dontlog-normal
        balance roundrobin
        server 10.0.1.201 10.0.1.201:443 check inter 2000 fall 2 rise 2 weight 1
        server 10.0.1.202 10.0.1.202:443 check inter 2000 fall 2 rise 2 weight 1

然后在两台node上分别安装keepalived并进行配置:

# 安装keepalived
yum install -y keepalived

# 编辑配置修改为如下:
# 这里是node 10.0.1.203
global_defs {
    router_id lb-master
}

vrrp_script check-haproxy {
    script "killall -0 haproxy"
    interval 5
    weight -60
}

vrrp_instance VI-kube-master {
    state MASTER
    priority 120
    unicast_src_ip 10.0.1.203
    unicast_peer {
        10.0.1.204
    }
    dont_track_primary
    interface ens32  # 注意这里的网卡名称修改成你机器真实的内网网卡名称,可用命令ip addr查看
    virtual_router_id 111
    advert_int 3
    track_script {
        check-haproxy
    }
    virtual_ipaddress {
        10.0.1.222
    }
}


# 这里是node 10.0.1.204
global_defs {
    router_id lb-master
}

vrrp_script check-haproxy {
    script "killall -0 haproxy"
    interval 5
    weight -60
}

vrrp_instance VI-kube-master {
    state MASTER
    priority 120
    unicast_src_ip 10.0.1.204
    unicast_peer {
        10.0.1.203
    }
    dont_track_primary
    interface ens32
    virtual_router_id 111
    advert_int 3
    track_script {
        check-haproxy
    }
    virtual_ipaddress {
        10.0.1.222
    }
}

全部安装配置完成后,在两台node上重启服务:

# 重启服务
systemctl restart haproxy.service
systemctl restart keepalived.service

# 查看运行状态
systemctl status haproxy.service 
systemctl status keepalived.service

# 添加开机自启动(haproxy默认安装好就添加了自启动)
systemctl enable keepalived.service
# 查看是否添加成功
systemctl is-enabled keepalived.service 
enabled就代表添加成功了

# 同时我可查看下VIP是否已经生成
[root@node-4 ~]# ip a|grep 222                           
    inet 10.0.1.222/32 scope global ens32

现在我们找一台带图形桌面的机器,绑定下hosts,来试试ingress吧

# 添加一条hosts
10.0.1.222 nginx.boge.com

打开浏览器,输入域名回车测试下:

第6关 k8s架构师课程之流量入口Ingress上部

做到这里,是不是有点成就感了呢,在已经知道了ingress能给我们带来什么后,我们回过头来理解Ingress的工作原理,这样掌握ingress会更加稳固,这也是我平时学习的方法

如下图,Client客户端对nginx.boge.com进行DNS查询,DNS服务器(我们这里是配的本地hosts)返回了Ingress控制器的IP(也就是我们的VIP:10.0.1.222)。然后Client客户端向Ingress控制器发送HTTP请求,并在请求Host头中指定nginx.boge.com。Ingress控制器从该头部确定Client客户端是想访问哪个服务,通过与该服务并联的Endpoint对象查看具体的Pod IP,并将Client客户端的请求转发给其中一个pod。

第6关 k8s架构师课程之流量入口Ingress上部

生产环境正常情况下大部分是一个Ingress对应一个Service服务,但在一些特殊情况,需要复用一个Ingress来访问多个服务的,下面我们来实践下

再创建一个nginx的deployment和service,注意名称修改下不要冲突了

# kubectl create deployment web --image=nginx
deployment.apps/web created

# kubectl expose deployment web --port=80 --target-port=80
service/web exposed

# 确认下创建结果
# kubectl  get deployments.apps 
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   1/1     1            1           16h
web     1/1     1            1           45s
# kubectl  get pod
NAME                    READY   STATUS    RESTARTS   AGE
nginx-f89759699-6vgr8   1/1     Running   1          16h
web-5dcb957ccc-nr2m7    1/1     Running   0          54s
# kubectl  get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1       <none>        443/TCP   17h
nginx        ClusterIP   10.68.238.54    <none>        80/TCP    16h
web          ClusterIP   10.68.229.231   <none>        80/TCP    16s

# 接着来修改Ingress
# 注意:这里可以通过两种方式来修改K8s正在运行的资源
# 第一种:直接通过edit修改在线服务的资源来生效,这个通常用在测试环境,在实际生产中不建议这么用
kubectl edit ingress nginx-ingress

# 第二种: 通过之前创建ingress的yaml配置,在上面进行修改,再apply更新进K8s,在生产中是建议这么用的,我们这里也用这种方式来修改
# vim nginx-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /    # 注意这里需要把进来到服务的请求重定向到/,这个和传统的nginx配置是一样的,不配会404
  name: nginx-ingress
spec:
  rules:
    - host: nginx.boge.com
      http:
        paths:
          - backend:
              serviceName: nginx
              servicePort: 80
            path: /nginx  # 注意这里的路由名称要是唯一的
          - backend:       # 从这里开始是新增加的
              serviceName: web
              servicePort: 80
            path: /web  # 注意这里的路由名称要是唯一的
# 开始创建
[root@node-1 ~]# kubectl  apply -f nginx-ingress.yaml 
ingress.extensions/nginx-ingress configured

# 同时为了更直观的看到效果,我们按前面讲到的方法来修改下nginx默认的展示页面
# kubectl exec -it nginx-f89759699-6vgr8 -- bash
echo "i am nginx" > /usr/share/nginx/html/index.html

# kubectl exec -it web-5dcb957ccc-nr2m7 -- bash
echo "i am web" > /usr/share/nginx/html/index.html

看下效果吧:

第6关 k8s架构师课程之流量入口Ingress上部

第6关 k8s架构师课程之流量入口Ingress上部

因为http属于是明文传输数据不安全,在生产中我们通常会配置https加密通信,现在实战下Ingress的tls配置

# 这里我先自签一个https的证书

#1. 先生成私钥key
# openssl genrsa -out tls.key 2048
Generating RSA private key, 2048 bit long modulus
..............................................................................................+++
.....+++
e is 65537 (0x10001)

#2.再基于key生成tls证书(注意:这里我用的*.boge.com,这是生成泛域名的证书,后面所有新增加的三级域名都是可以用这个证书的)
# openssl req -new -x509 -key tls.key -out tls.cert -days 360 -subj /CN=*.boge.com

# 看下创建结果
# ll
total 8
-rw-r--r-- 1 root root 1099 Nov 27 11:44 tls.cert
-rw-r--r-- 1 root root 1679 Nov 27 11:43 tls.key

# 在K8s上创建tls的secret(注意默认ns是default)
# kubectl create secret tls mytls --cert=tls.cert --key=tls.key 
secret/mytls created

# 然后修改先的ingress的yaml配置
# cat nginx-ingress.yaml 
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /    # 注意这里需要把进来到服务的请求重定向到/,这个和传统的nginx配置是一样的,不配会404
  name: nginx-ingress
spec:
  rules:
    - host: nginx.boge.com
      http:
        paths:
          - backend:
              serviceName: nginx
              servicePort: 80
            path: /nginx  # 注意这里的路由名称要是唯一的
          - backend:       # 从这里开始是新增加的
              serviceName: web
              servicePort: 80
            path: /web  # 注意这里的路由名称要是唯一的
  tls:    # 增加下面这段,注意缩进格式
      - hosts:
          - nginx.boge.com   # 这里域名和上面的对应
        secretName: mytls    # 这是我先生成的secret

# 进行更新
# kubectl  apply -f nginx-ingress.yaml         
ingress.extensions/nginx-ingress configured

现在再来看看https访问的效果:

注意:这里因为是我自签的证书,所以浏览器的访问时会提示您的连接不是私密连接 ,我这里用的谷歌浏览器,直接点高级,再点击继续前往nginx.boge.com(不安全)

第6关 k8s架构师课程之流量入口Ingress上部

第7关 k8s架构师课程之HPA 自动水平伸缩pod

HPA

大家好,我是博哥爱运维,这节课带来k8s的HPA 自动水平伸缩pod。

我们知道,初始Pod的数量是可以设置的,同时业务也分流量高峰和低峰,那么怎么即能不过多的占用K8s的资源,又能在服务高峰时自动扩容pod的数量呢,在K8s上的答案是Horizontal Pod Autoscaling,简称HPA 自动水平伸缩,这里只以我们常用的CPU计算型服务来作为HPA的测试,这基本满足了大部分业务服务需求,其它如vpa纵向扩容,还有基于业务qps等特殊指标扩容这个在后面计划会以独立高级番外篇来作教程。

自动水平伸缩,是指运行在k8s上的应用负载(POD),可以根据资源使用率进行自动扩容、缩容,它依赖metrics-server服务pod使用资源指标收集;我们知道应用的资源使用率通常都有高峰和低谷,所以k8s的HPA特性应运而生;它也是最能体现区别于传统运维的优势之一,不仅能够弹性伸缩,而且完全自动化

我们在生产中通常用得最多的就是基于服务pod的cpu使用率metrics来自动扩容pod数量,下面来以生产的标准来实战测试下(注意:使用HPA前我们要确保K8s集群的dns服务和metrics服务是正常运行的,并且我们所创建的服务需要配置指标分配)

# pod内资源分配的配置格式如下:
# 默认可以只配置requests,但根据生产中的经验,建议把limits资源限制也加上,因为对K8s来说,只有这两个都配置了且配置的值都要一样,这个pod资源的优先级才是最高的,在node资源不够的情况下,首先是把没有任何资源分配配置的pod资源给干掉,其次是只配置了requests的,最后才是两个都配置的情况,仔细品品
      resources:
        limits:   # 限制单个pod最多能使用1核(1000m 毫核)cpu以及2G内存
          cpu: "1"
          memory: 2Gi
        requests: # 保证这个pod初始就能分配这么多资源
          cpu: "1"
          memory: 2Gi

我们先不做上面配置的改动,看看直接创建hpa会产生什么情况:

# 为deployment资源web创建hpa,pod数量上限3个,最低1个,在pod平均CPU达到50%后开始扩容
kubectl  autoscale deployment web --max=3 --min=1 --cpu-percent=50

#过一会看下这个hpa资源的描述(截取这下面一部分)
# 下面提示说到,HPA缺少最小资源分配的request参数
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: missing request for cpu
Events:
  Type     Reason                        Age                     From                       Message
  ----     ------                        ----                    ----                       -------
  Warning  FailedComputeMetricsReplicas  3m46s (x12 over 6m33s)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu
  Warning  FailedGetResourceMetric       89s (x21 over 6m33s)    horizontal-pod-autoscaler  missing request for cpu

我们现在以上面创建的deployment资源web来实践下hpa的效果,首先用我们学到的方法导出web的yaml配置,并增加资源分配配置增加:

# cat web.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: web
  name: web
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - image: nginx
        name: nginx
        resources:
          limits:   # 因为我这里是测试环境,所以这里CPU只分配50毫核(0.05核CPU)和20M的内存
            cpu: "50m"
            memory: 20Mi
          requests: # 保证这个pod初始就能分配这么多资源
            cpu: "50m"
            memory: 20Mi

更新web资源:

# kubectl  apply -f web.yaml              
deployment.apps/web configured

然后创建hpa:

# kubectl  autoscale deployment web --max=3 --min=1 --cpu-percent=50         
horizontalpodautoscaler.autoscaling/web autoscaled

# 等待一会,可以看到相关的hpa信息(K8s上metrics服务收集所有pod资源的时间间隔大概在60s的时间)
# kubectl get hpa -w
NAME   REFERENCE        TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
web    Deployment/web   <unknown>/50%   1         3         1          39s
web    Deployment/web   0%/50%          1         3         1          76s

我们来模拟业务流量增长,看看hpa自动伸缩的效果:

# 我们启动一个临时pod,来模拟大量请求
# kubectl run -it --rm busybox --image=busybox -- sh
/ # while :;do wget -q -O- http://web;done

# 等待2 ~ 3分钟,注意k8s为了避免频繁增删pod,对副本的增加速度有限制
# kubectl get hpa web -w
NAME   REFERENCE        TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
web    Deployment/web   0%/50%    1         3         1          11m
web    Deployment/web   102%/50%   1         3         1          14m
web    Deployment/web   102%/50%   1         3         3          14m

# 看下hpa的描述信息下面的事件记录
# kubectl describe hpa web
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
...
  Normal   SuccessfulRescale             62s                horizontal-pod-autoscaler  New size: 3; reason: cpu resource utilization (percentage of request) above target

好了,HPA的自动扩容已经见过了,现在停掉压测,观察下HPA的自动收缩功能:

# 可以看到,在业务流量高峰下去后,HPA并不急着马上收缩pod数量,而是等待5分钟后,再进行收敛,这是稳妥的作法,是k8s为了避免频繁增删pod的一种手段
# kubectl get hpa web -w
NAME   REFERENCE        TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
web    Deployment/web   102%/50%   1         3         3          16m
web    Deployment/web   0%/50%     1         3         3          16m
web    Deployment/web   0%/50%     1         3         3          20m
web    Deployment/web   0%/50%     1         3         1          21m

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

第8关 k8s架构师课程之持久化存储第一节

大家好,我是博哥爱运维,K8s是如何来管理存储资源的呢?跟着博哥来会会它们吧!

Volume

我们这里先来聊聊K8s的存储模型Volume,来实践下如何将各种持久化的存储映射到Pod中的容器。

在我们上面的实战中,大家如果细心的话,会发现把nginx服务pod内的默认页面改了,但当重启pod后,这个页面又恢复成nginx容器初始的状态了,所以这里要和大家说的是,在没有配置持久化存储前,任何新增的数据在pod发生重启时都是无法保留的,而在K8s上,Pod的生命周期可能是很短,它们会被频繁地销毁和创建,自然在容器销毁时,里面运行时新增的数据,如修改的配置及日志文件等也会被清除。

那么怎么解决这一现象呢,我们可以用K8s volume来持久化保存容器的数据,Volume的生命周期独立于容器,Pod中的容器可能被销毁重建,但Volume会被保留。

本质上,K8s volume是一个目录,这点和Docker volume差不多,当Volume被mount到Pod上,这个Pod中的所有容器都可以访问这个volume,在生产场景中,我们常用的类型有这几种:

  • emptyDir
  • hostPath
  • PersistentVolume(PV) & PersistentVolumeClaim(PVC)
  • StorageClass

emptyDir

我们先开始讲讲emptyDir,它是最基础的Volume类型,pod内的容器发生重启不会造成emptyDir里面数据的丢失,但是当pod被重启后,emptyDir数据会丢失,也就是说emptyDir与pod的生命周期是一致的,那么大家可能有个疑问,这个之前讲的没有配置它也没什么区别呀,实际上在某些时候,它的作用还是挺大的,在生产中它的最实际实用是提供Pod内多容器的volume数据共享,下面我会用一个实际的生产者,消费者的例子来演示下emptyDir的作用,相信大家动动手就会理解得更快了

# 我们继续用上面的web服务的配置,在里面新增volume配置
# cat web.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: web
  name: web
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - image: nginx
        name: nginx
        resources:
          limits:
            cpu: "50m"
            memory: 20Mi
          requests:
            cpu: "50m"
            memory: 20Mi
        volumeMounts:         # 准备将pod的目录进行卷挂载
          - name: html-files  # 自定个名称,容器内可以类似这样挂载多个卷
            mountPath: "/usr/share/nginx/html"

      - name: busybox       # 在pod内再跑一个容器,每秒把当时时间写到nginx默认页面上
        image: busybox
        args:
        - /bin/sh
        - -c
        - >
           while :; do
             if [ -f /html/index.html ];then
               echo "[$(date +%F\ %T)] hello" > /html/index.html
               sleep 1
             else
               touch /html/index.html
             fi
           done
        volumeMounts:
          - name: html-files  # 注意这里的名称和上面nginx容器保持一样,这样才能相互进行访问
            mountPath: "/html"  # 将数据挂载到当前这个容器的这个目录下
      volumes:
        - name: html-files   # 最后定义这个卷的名称也保持和上面一样
          emptyDir:          # 这就是使用emptyDir卷类型了
            medium: Memory   # 这里将文件写入内存中保存,这样速度会很快,配置为medium: "" 就是代表默认的使用本地磁盘空间来进行存储
            sizeLimit: 10Mi  # 因为内存比较珍贵,注意限制使用大小

更新这个web的配置

# kubectl apply -f web.yaml 
deployment.apps/web configured

# 可以看到READY下面容器数量变为2了
# kubectl get pod
NAME                    READY   STATUS    RESTARTS   AGE
......
web-5bf769fdfc-44p7h    2/2     Running   0          2m4s

# 我们先直接用服务的IP来快速测试下效果
# kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
......
web          ClusterIP   10.68.229.231   <none>        80/TCP    4h36m

# 可以看到每次访问都是被写入当前最新时间的页面内容
[root@node-1 ~]# curl 10.68.229.231
[2020-11-27 07:21:34] hello
[root@node-1 ~]# curl 10.68.229.231
[2020-11-27 07:21:35] hello
[root@node-1 ~]# curl 10.68.229.231
[2020-11-27 07:21:36] hello
[root@node-1 ~]# curl 10.68.229.231
[2020-11-27 07:21:38] hello

浏览器访问也是一样的

我们来探究下原理

# 看下这个web的pod的描述信息
# kubectl describe pod web-5bf769fdfc-44p7h 
......
Node:         10.0.1.203/10.0.1.203     # 找到这个pod运行在哪个node上
......
Containers:
  nginx:
    Container ID:   docker://c1482a15f756ff3bc089973ec942a4e60f7ec34674ab8435a47a94d4b93411a7   # 找到pod内nginx容器的ID
......
  busybox:
    Container ID:  docker://ecedf3b0ffa6b5101e84a21f8dbf6188179875b5db61980bc93b65195f558c6f   # 找到pod内busybox容器的ID
    
    
# 我们登陆10.0.1.203 这台node,查看pod内这两个容器的volume挂载信息,我们发现两个容器都 mount 了同一个目录
[root@node-3 ~]# docker inspect c1482a15f756ff3bc089973ec942a4e60f7ec34674ab8435a47a94d4b93411a7|grep volume|grep html
                "/var/lib/container/kubelet/pods/cc4832f3-c73c-479f-9088-12b079ff4608/volumes/kubernetes.io~empty-dir/html-files:/usr/share/nginx/html",
                "Source": "/var/lib/container/kubelet/pods/cc4832f3-c73c-479f-9088-12b079ff4608/volumes/kubernetes.io~empty-dir/html-files",
                
                
[root@node-3 ~]# docker inspect ecedf3b0ffa6b5101e84a21f8dbf6188179875b5db61980bc93b65195f558c6f|grep volume|grep html
                "/var/lib/container/kubelet/pods/cc4832f3-c73c-479f-9088-12b079ff4608/volumes/kubernetes.io~empty-dir/html-files:/html",
                "Source": "/var/lib/container/kubelet/pods/cc4832f3-c73c-479f-9088-12b079ff4608/volumes/kubernetes.io~empty-dir/html-files",    

hostPath

hostPath Volume 的作用是将容器运行的node上已经存在文件系统目录给mount到pod的容器。在生产中大部分应用是是不会直接使用hostPath的,因为我们并不关心Pod在哪台node上运行,而hostPath又恰好增加了pod与node的耦合,限制了pod的使用,这里我们只作一下了解,知道有这个东西存在即可,一般只是一些安装服务会用到,比如下面我截取了网络插件calico的部分volume配置:

    volumeMounts:
    - mountPath: /host/driver
      name: flexvol-driver-host
......
  volumes:
......
  - hostPath:
      path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
      type: DirectoryOrCreate
    name: flexvol-driver-host

第8关 k8s架构师课程之持久化存储第二节PV和PVC

大家好,我是博哥爱运维,k8s持久化存储的第二节,给大家带来 PersistentVolume(PV) & PersistentVolumeClaim(PVC) 的讲解。

现在讲Volume里面在生产中用的最多的PersistentVolume(持久卷,简称PV)和 PersistentVolumeClaim(持久卷消费,简称PVC),通常在企业中,Volume是由存储系统的管理员来维护,他们来提供pv,pv具有持久性,生命周期独立于Pod;Pod则是由应用的开发人员来维护,如果要进行一卷挂载,那么就写一个pvc来消费pv就可以了,K8s会查找并提供满足条件的pv。

有了pvc,我们在K8s进行卷挂载就只需要考虑要多少容量了,而不用关心真正的空间是用什么存储系统做的等一些底层细节信息,pv这些只有存储管理员才应用去关心它。

K8s支持多种类型的pv,我们这里就以生产中常用的NFS来作演示(在云上的话就用NAS),生产中如果对存储要求不是太高的话,建议就用NFS,这样出问题也比较容易解决,如果有性能需求,可以看看rook的ceph,以及Rancher的Longhorn,这些我都在生产中用过,如果有需求的同学可以在评论区留言,我会单独做课程来讲解。

开始部署NFS-SERVER

# 我们这里在10.0.1.201上安装(在生产中,大家要提供作好NFS-SERVER环境的规划)
# yum -y install nfs-utils

# 创建NFS挂载目录
# mkdir /nfs_dir
# chown nobody.nobody /nfs_dir

# 修改NFS-SERVER配置
# echo '/nfs_dir *(rw,sync,no_root_squash)' > /etc/exports

# 重启服务
# systemctl restart rpcbind.service
# systemctl restart nfs-utils.service 
# systemctl restart nfs-server.service 

# 增加NFS-SERVER开机自启动
# systemctl enable nfs-server.service 
Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.

# 验证NFS-SERVER是否能正常访问
# showmount -e 10.0.1.201                 
Export list for 10.0.1.201:
/nfs_dir *

创建基于NFS的PV

首先在NFS-SERVER的挂载目录里面创建一个目录

# mkdir /nfs_dir/pv1

接着准备好pv的yaml配置,保存为pv1.yaml

# cat pv1.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv1
  labels:
    type: test-claim    # 这里建议打上一个独有的标签,方便在多个pv的时候方便提供pvc选择挂载
spec:
  capacity:
    storage: 1Gi     # <----------  1
  accessModes:
    - ReadWriteOnce     # <----------  2
  persistentVolumeReclaimPolicy: Recycle     # <----------  3
  storageClassName: nfs     # <----------  4
  nfs:
    path: /nfs_dir/pv1     # <----------  5
    server: 10.0.1.201
  1. capacity 指定 PV 的容量为 1G。
  2. accessModes 指定访问模式为 ReadWriteOnce,支持的访问模式有: ReadWriteOnce – PV 能以 read-write 模式 mount 到单个节点。 ReadOnlyMany – PV 能以 read-only 模式 mount 到多个节点。 ReadWriteMany – PV 能以 read-write 模式 mount 到多个节点。
  3. persistentVolumeReclaimPolicy 指定当 PV 的回收策略为 Recycle,支持的策略有: Retain – 需要管理员手工回收Recycle – 清除 PV 中的数据,效果相当于执行 rm -rf /thevolume/*Delete – 删除 Storage Provider 上的对应存储资源,例如 AWS EBS、GCE PD、Azure Disk、OpenStack Cinder Volume 等。
  4. storageClassName 指定 PV 的 class 为 nfs。相当于为 PV 设置了一个分类,PVC 可以指定 class 申请相应 class 的 PV。
  5. 指定 PV 在 NFS 服务器上对应的目录,这里注意,我测试的时候,需要手动先创建好这个目录并授权好,不然后面挂载会提示目录不存在 mkdir /nfsdata/pv1 && chown -R nobody.nogroup /nfsdata 。

创建这个pv

# kubectl apply -f pv1.yaml 
persistentvolume/pv1 created

# STATUS 为 Available,表示 pv1 就绪,可以被 PVC 申请
# kubectl get pv
NAME   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
pv1    1Gi        RWO            Recycle          Available           nfs                     4m45s

接着准备PVC的yaml,保存为pvc1.yaml

# cat pvc1.yaml 
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: nfs
  selector:
    matchLabels:
      type: test-claim

创建这个pvc

# kubectl apply -f pvc1.yaml          
persistentvolumeclaim/pvc1 created

# 看下pvc的STATUS为Bound代表成功挂载到pv了
# kubectl get pvc           
NAME   STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc1   Bound    pv1      1Gi        RWO            nfs            2s

# 这个时候再看下pv,STATUS也是Bound了,同时CLAIM提示被default/pvc1消费
# kubectl get pv
NAME   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM          STORAGECLASS   REASON   AGE
pv1    1Gi        RWO            Recycle          Bound    default/pvc1   nfs  

下面我们准备pod服务来挂载这个pvc,这里就以上面最开始演示用的nginx的deployment的yaml配置来作修改

# cat nginx.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        volumeMounts:    # 我们这里将nginx容器默认的页面目录挂载
          - name: html-files
            mountPath: "/usr/share/nginx/html"
      volumes:
        - name: html-files
          persistentVolumeClaim:  # 卷类型使用pvc,同时下面名称处填先创建好的pvc1
            claimName: pvc1

更新配置

# kubectl apply -f nginx.yaml 
service/nginx unchanged
deployment.apps/nginx configured

# 我们看到新pod已经在创建了
# kubectl get pod
NAME                     READY   STATUS              RESTARTS   AGE
nginx-569546db98-4nmmg   0/1     ContainerCreating   0          5s
nginx-f89759699-6vgr8    1/1     Running             1          23h
web-5bf769fdfc-44p7h     2/2     Running             0          113m

# 我们这里直接用svc地址测试一下
# kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1       <none>        443/TCP   23h
nginx        ClusterIP   10.68.238.54    <none>        80/TCP    23h
web          ClusterIP   10.68.229.231   <none>        80/TCP    6h27m

# 咦,这里为什么是显示403了呢,注意,卷挂载后会把当前已经存在这个目录的文件给覆盖掉,这个和传统机器上的磁盘目录挂载道理是一样的
[root@node-1 ~]# curl 10.68.238.54
<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx/1.19.5</center>
</body>
</html>

# 我们来自己创建一个index.html页面
# echo 'hello, world!' > /nfs_dir/pv1/index.html

# 再请求下看看,已经正常了
# curl 10.68.238.54                             
hello, world!

# 我们来手动删除这个nginx的pod,看下容器内的修改是否是持久的呢?
# kubectl delete pod nginx-569546db98-4nmmg 
pod "nginx-569546db98-4nmmg" deleted

# 等待一会,等新的pod被创建好
# kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
nginx-569546db98-99qpq   1/1     Running   0          45s

# 再测试一下,可以看到,容器内的修改现在已经被持久化了
# curl 10.68.238.54        
hello, world!

# 后面我们再想修改有两种方式,一个是exec进到pod内进行修改,还有一个是直接修改挂载在NFS目录下的文件
# echo 111 > /nfs_dir/pv1/index.html
# curl 10.68.238.54  
111

下面讲下如何回收PVC以及PV

# 这里删除时会一直卡着,我们按ctrl+c看看怎么回事
# kubectl delete pvc pvc1 
persistentvolumeclaim "pvc1" deleted
^C

# 看下pvc发现STATUS是Terminating删除中的状态,我分析是因为服务pod还在占用这个pvc使用中
# kubectl get pvc
NAME   STATUS        VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc1   Terminating   pv1      1Gi        RWO            nfs            21m

# 先删除这个pod
# kubectl delete pod nginx-569546db98-99qpq 
pod "nginx-569546db98-99qpq" deleted

# 再看先删除的pvc已经没有了
# kubectl get pvc
No resources found in default namespace.

# 根据先前创建pv时的数据回收策略为Recycle – 清除 PV 中的数据,这时果然先创建的index.html已经被删除了,在生产中要尤其注意这里的模式,注意及时备份数据,注意及时备份数据,注意及时备份数据
# ll /nfs_dir/pv1/
total 0

# 虽然此时pv是可以再次被pvc来消费的,但根据生产的经验,建议在删除pvc时,也同时把它消费的pv一并删除,然后再重启创建都是可以的

第8关 k8s架构师课程之持久化存储StorageClass

大家好,我是博哥爱运维,k8s持久化存储的第三节,给大家带来 StorageClass动态存储的讲解。

我们上节课提到了K8s对于存储解耦的设计是,pv交给存储管理员来管理,我们只管用pvc来消费就好,但这里我们实际还是得一起管理pv和pvc,在实际工作中,我们(存储管理员)可以提前配置好pv的动态供给StorageClass,来根据pvc的消费动态生成pv

StorageClass

我这是直接拿生产中用的实例来作演示,利用nfs-client-provisioner来生成一个基于nfs的StorageClass,部署配置yaml配置如下,保持为nfs-sc.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  namespace: kube-system

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["list", "watch", "create", "update", "patch"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    namespace: kube-system 
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: nfs-provisioner-01
  namespace: kube-system
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-provisioner-01
  template:
    metadata:
      labels:
        app: nfs-provisioner-01
    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          image: jmgao1983/nfs-client-provisioner:latest
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: nfs-provisioner-01  # 此处供应者名字供storageclass调用
            - name: NFS_SERVER
              value: 10.0.1.201   # 填入NFS的地址
            - name: NFS_PATH
              value: /nfs_dir   # 填入NFS挂载的目录
      volumes:
        - name: nfs-client-root
          nfs:
            server: 10.0.1.201   # 填入NFS的地址
            path: /nfs_dir   # 填入NFS挂载的目录

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-boge
provisioner: nfs-provisioner-01
# Supported policies: Delete、 Retain , default is Delete
reclaimPolicy: Retain

开始创建这个StorageClass:

# kubectl apply -f nfs-sc.yaml 
serviceaccount/nfs-client-provisioner created
clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
deployment.apps/nfs-provisioner-01 created
  orageclass.storage.k8s.io/nfs-boge created

# 注意这个是在放kube-system的namespace下面,这里面放置一些偏系统类的服务
# kubectl -n kube-system get pod -w
NAME                                       READY   STATUS              RESTARTS   AGE
calico-kube-controllers-7fdc86d8ff-dpdm5   1/1     Running             1          24h
calico-node-8jcp5                          1/1     Running             1          24h
calico-node-m92rn                          1/1     Running             1          24h
calico-node-xg5n4                          1/1     Running             1          24h
calico-node-xrfqq                          1/1     Running             1          24h
coredns-d9b6857b5-5zwgf                    1/1     Running             1          24h
metrics-server-869ffc99cd-wfj44            1/1     Running             2          24h
nfs-provisioner-01-5db96d9cc9-qxlgk        0/1     ContainerCreating   0          9s
nfs-provisioner-01-5db96d9cc9-qxlgk        1/1     Running             0          21s

# StorageClass已经创建好了
# kubectl get sc
NAME       PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
nfs-boge   nfs-provisioner-01   Retain          Immediate           false                  37s

我们来基于StorageClass创建一个pvc,看看动态生成的pv是什么效果:

# vim pvc-sc.yaml 
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-sc
spec:
  storageClassName: nfs-boge
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
      
# kubectl  apply -f pvc-sc.yaml 
persistentvolumeclaim/pvc-sc created

# kubectl  get pvc
NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-sc   Bound    pvc-63eee4c7-90fd-4c7e-abf9-d803c3204623   1Mi        RWX            nfs-boge       3s
pvc1     Bound    pv1                                        1Gi        RWO            nfs            24m

# kubectl  get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   REASON   AGE
pv1                                        1Gi        RWO            Recycle          Bound    default/pvc1     nfs                     49m
pvc-63eee4c7-90fd-4c7e-abf9-d803c3204623   1Mi        RWX            Retain           Bound    default/pvc-sc   nfs-boge                7s

我们修改下nginx的yaml配置,将pvc的名称换成上面的pvc-sc:

# vim nginx.yaml 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        volumeMounts:    # 我们这里将nginx容器默认的页面目录挂载
          - name: html-files
            mountPath: "/usr/share/nginx/html"
      volumes:
        - name: html-files
          persistentVolumeClaim:
            claimName: pvc-sc
            
            
# kubectl apply -f nginx.yaml 
service/nginx unchanged
deployment.apps/nginx configured

# 这里注意下,因为是动态生成的pv,所以它的目录基于是一串随机字符串生成的,这时我们直接进到pod内来创建访问页面
# kubectl exec -it nginx-57cdc6d9b4-n497g -- bash
root@nginx-57cdc6d9b4-n497g:/# echo 'storageClass used' > /usr/share/nginx/html/index.html
root@nginx-57cdc6d9b4-n497g:/# exit

# curl 10.68.238.54                              
storageClass used

# 我们看下NFS挂载的目录
# ll /nfs_dir/
total 0
drwxrwxrwx 2 root root 24 Nov 27 17:52 default-pvc-sc-pvc-63eee4c7-90fd-4c7e-abf9-d803c3204623
drwxr-xr-x 2 root root  6 Nov 27 17:25 pv1

第9关 k8s架构师课程之有状态服务StatefulSet

原创2021-03-20 20:50·博哥爱运维

StatefulSet

大家好,我是博哥爱运维,K8s是如何来管理有状态服务的呢?跟着博哥来会会它们吧!

前面我们讲到了Deployment、DaemonSet都只适合用来跑无状态的服务pod,那么这里的StatefulSet(简写sts)就是用来跑有状态服务pod的。

那怎么理解有状态服务和无状态服务呢?简单快速地理解为:无状态服务最典型的是WEB服务器的每次http请求,它的每次请求都是全新的,和之前的没有关系;那么有状态服务用网游服务器来举例比较恰当了,每个用户的登陆请求,服务端都是先根据这个用户之前注册过的帐号密码等信息来判断这次登陆请求是否正常。

无状态服务因为相互之前都是独立的,很适合用横向扩充来增加服务的资源量

还有一个很形象的比喻,在K8s的无状态服务的pod有点类似于农村圈养的牲畜,饲养它们的人不会给它们每个都单独取个名字(pod都是随机名称,IP每次发生重启也是变化的),当其中一只病了或被卖了,带来的感观只是数量上的减少,这时再买些相应数量的牲畜回来就可以回到之前的状态了(当一个pod因为某些原来被删除掉的时候,K8s会启动一个新的pod来代替它);而有状态服务的pod就像养的一只只宠物,主人对待自己喜欢的宠物都会给它们取一个比较有特色的名字(在K8s上运行的有状态服务的pod,都会被给予一个独立的固定名称),并且每只宠物都有它独特的外貌和性格,如果万一这只宠物丢失了,那么需要到宠物店再买一只同样品种同样毛色的宠物来代替了(当有状态服务的pod删除时,K8s会启动一个和先前一模一样名称的pod来代替它)。

有状态服务sts比较常见的mongo复制集 ,redis cluster,rabbitmq cluster等等,这些服务基本都会用StatefulSet模式来运行,当然除了这个,它们内部集群的关系还需要一系列脚本或controller来维系它们间的状态,这些会在后面进阶课程专门来讲,现在为了让大家先更好的明白StatefulSet,我这里直接还是用nginx服务来实战演示

1、创建pv
-------------------------------------------

root@node1:~# cat web-pv.yaml 
# mkdir -p /nfs_dir/{web-pv0,web-pv1}
apiVersion: v1
kind: PersistentVolume
metadata:
  name: web-pv0
  labels:
    type: web-pv0
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: my-storage-class
  nfs:
    path: /nfs_dir/web-pv0
    server: 10.0.1.201
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: web-pv1
  labels:
    type: web-pv1
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: my-storage-class
  nfs:
    path: /nfs_dir/web-pv1
    server: 10.0.1.201


2、创建pvc(这一步可以省去让其自动创建,这里手动创建是为了让大家能更清楚在sts里面pvc的创建过程)
-------------------------------------------
这一步非常非常的关键,因为如果创建的PVC的名称和StatefulSet中的名称没有对应上,
那么StatefulSet中的Pod就肯定创建不成功.

我们在这里创建了一个叫做www-web-0和www-web-1的PVC,这个名字是不是很奇怪,
而且在这个yaml里并没有提到PV的名字,所以PV和PVC是怎么bound起来的呢?
是通过labels标签下的key:value键值对来进行匹配的,
我们在创建PV时指定了label的键值对,在PVC里通过selector可以指定label。

然后再回到这个PVC的名称定义:www-web-0,为什么叫这样一个看似有规律的名字呢,
这里需要看看下面创建StatefulSet中的yaml,
首先我们看到StatefulSet的name叫web,设置的replicas为2个,
volumeMounts和volumeClaimTemplates的name必须相同,为www,
所以StatefulSet创建的第一个Pod的name应该为web-0,第二个为web-1。
这里StatefulSet中的Pod与PVC之间的绑定关系是通过名称来匹配的,即:

PVC_name  =  volumeClaimTemplates_name + "-" + pod_name
www-web-0     =       www               + "-" +   web-0
www-web-1     =       www               + "-" +   web-1


root@node1:~# cat web-pvc.yaml 
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: www-web-0
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: my-storage-class
  selector:
    matchLabels:
      type: web-pv0
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: www-web-1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: my-storage-class
  selector:
    matchLabels:
      type: web-pv1


3、创建Service 和 StatefulSet
-------------------------------------------
在上一步中我们已经创建了名为www-web-0的PVC了,接下来创建一个service和statefulset,
service的名称可以随意取,但是statefulset的名称已经定死了,为web,
并且statefulset中的volumeClaimTemplates_name必须为www,volumeMounts_name也必须为www。
只有这样,statefulset中的pod才能通过命名来匹配到PVC,否则会创建失败。

root@node1:~# cat web.yaml 
apiVersion: v1
kind: Service
metadata:
  name: web-headless
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---

apiVersion: v1
kind: Service
metadata:
  name: web
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  selector:
    app: nginx

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 2 # by default is 1
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: nginx
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "my-storage-class"
      resources:
        requests:
          storage: 1Gi

为了方便大家学习,博哥所有的文字课程,随后都会录制出对应的教学视频,大家就可以直接把这个文字教程当做课程笔记来用了。

第10关 k8s架构师课程之一次性和定时任务

原创2021-03-21 20:14·博哥爱运维

Job, CronJob

大家好,我是博哥爱运维,有时候我们想在K8s跑个一次性任务,或者是定时任务,能不能实现呢,答案肯定是可以的。

job

首先讲下一次性任务,在K8s中它叫job,直接来实战一番,先准备下yaml配置

这里我们不知道yaml怎么写,可以直接kubectl create job -h就能看到命令行创建示例了,然后可以根据创建出来的服务资源来导出它的yaml配置为my-job.yaml

apiVersion: batch/v1   # 1. batch/v1 是当前 Job 的 apiVersion
kind: Job        #  2. 指明当前资源的类型为 Job
metadata:
  name: my-job
spec:
  template:
    metadata:
    spec:
      containers:
      - image: busybox
        name: my-job
        command: ["echo","Hello, boge."]
      restartPolicy: Never   # 3. restartPolicy 指定什么情况下需要重启容器。对于 Job,只能设置为 Never 或者 OnFailure

创建它并查看结果

# kubectl apply -f my-job.yaml 
job.batch/my-job created

# kubectl get jobs.batch 
NAME     COMPLETIONS   DURATION   AGE
my-job   1/1           2s         73s
# COMPLETIONS 已完成的
# DURATION  这个job运行所花费的时间
# AGE 这个job资源已经从创建到目前为止的时间

# job会生成一个pod,当完成任务后会是Completed的状态
# kubectl get pod
NAME           READY   STATUS      RESTARTS   AGE
my-job-7h6fb   0/1     Completed   0          31s

# 看下这个job生成的pod日志
# kubectl logs my-job-7h6fb 
Hello, boge.

job失败了会有什么现象出现呢?

我们编辑这个job的yaml,把执行的命令改成一个不存在的命令看看会发生什么

apiVersion: batch/v1   # 1. batch/v1 是当前 Job 的 apiVersion
kind: Job        #  2. 指明当前资源的类型为 Job
metadata:
  name: my-job
spec:
  template:
    metadata:
    spec:
      containers:
      - image: busybox
        name: my-job
        command: ["echoaaa","Hello, boge."]
      restartPolicy: Never   # 3. restartPolicy 指定什么情况下需要重启容器。对于 Job,只能设置为 Never 或者 OnFailure

创建它

# kubectl apply -f my-job.yaml 

# 可以观察到这个job因为不成功,并且restartPolicy重启模式是Never不会被重启,但它的job状态始终未完成,所以它会一直不停的创建新的pod,直到COMPLETIONS为1/1,对于我们这个示例,它显然永远都不会成功
# kubectl get pod
NAME           READY   STATUS       RESTARTS   AGE
my-job-9fcbm   0/1     StartError   0          47s
my-job-bt2kd   0/1     StartError   0          54s
my-job-mlnzz   0/1     StartError   0          37s
my-job-mntdp   0/1     StartError   0          17s

# kubectl get job
NAME     COMPLETIONS   DURATION   AGE
my-job   0/1           15s        15s

# 找一个pod看下事件描述,会很清晰地指出命令不存在
# kubectl describe pod my-job-9fcbm 
Name:         my-job-9fcbm
Namespace:    default
......
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  44s   default-scheduler  Successfully assigned default/my-job-9fcbm to 10.0.0.204
  Normal   Pulling    43s   kubelet            Pulling image "busybox"
  Normal   Pulled     36s   kubelet            Successfully pulled image "busybox" in 7.299038719s
  Normal   Created    36s   kubelet            Created container my-job
  Warning  Failed     36s   kubelet            Error: failed to create containerd task: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "echoaaa": executable file not found in $PATH: unknown

# 删除掉这个job,不然那创建的pod数量可有够多的了
# kubectl  delete job my-job

# 试试把restartPolicy重启模式换成OnFailure观察看看
# kubectl get pod
NAME           READY   STATUS             RESTARTS   AGE
my-job-gs95h   0/1     CrashLoopBackOff   3          84s

# 可以看到它不会创建新的pod,而是会尝试重启自身,以期望恢复正常,这里看到已经重启了3次,还会持续增加到5,然后会被K8s给删除以尝试,因为这里只是job而不是deployment,它不会自己再启动一个新的pod,所以这个job等于就没有了,这里说明OnFailure是生效的,至少不会有那么多错误的pod出现了

并行执行job

准备好yaml配置

apiVersion: batch/v1
kind: Job
metadata:
  name: my-job
spec:
  parallelism: 2  # 并行执行2个job
  template:
    metadata:
      name: my-job
    spec:
      containers:
      - image: busybox
        name: my-job
        command: ["echo","Hello, boge."]
      restartPolicy: OnFailure

创建并查看结果

# kubectl apply -f my-job.yaml 
job.batch/my-job created

# job一共启动了2个pod,并且它们的AGE一样,可见是并行创建的
# kubectl get pod
NAME           READY   STATUS      RESTARTS   AGE
my-job-fwf8l   0/1     Completed   0          7s
my-job-w2fxd   0/1     Completed   0          7s

再来个组合测试下并行完成定制的总任务数量

apiVersion: batch/v1
kind: Job
metadata:
  name: myjob
spec:
  completions: 6   # 此job完成pod的总数量
  parallelism: 2   # 每次并发跑2个job
  template:
    metadata:
      name: myjob
    spec:
      containers:
      - name: hello
        image: busybox
        command: ["echo"," hello boge! "]
      restartPolicy: OnFailure

创建并查看结果

# 可以看到是每次并发2个job,完成6个总量即停止
# kubectl get pod
NAME          READY   STATUS      RESTARTS   AGE
myjob-54wmk   0/1     Completed   0          11s
myjob-fgtmj   0/1     Completed   0          15s
myjob-fkj5l   0/1     Completed   0          7s
myjob-hsccm   0/1     Completed   0          7s
myjob-lrpsr   0/1     Completed   0          15s
myjob-ppfns   0/1     Completed   0          11s

# 符合预期
# kubectl get job
NAME    COMPLETIONS   DURATION   AGE
myjob   6/6           14s        34s

# 测试完成后删掉这个资源
kubectl delete job myjob

到此,job的内容就讲完了,在生产中,job比较适合用在CI/CD流水线中,作完一次性任务使用,我在生产中基本没怎么用这个资源。

cronjob

上面的job是一次性任务,那我们需要定时循环来执行一个任务可以吗?答案肯定是可以的,就像我们在linux系统上面用crontab一样,在K8s上用cronjob的另一个好处就是它是分布式的,执行的pod可以是在集群中的任意一台NODE上面(这点和cronsun有点类似)

让我们开始实战吧,先准备一下cronjob的yaml配置为my-cronjob.yaml

apiVersion: batch/v1beta1     # <---------  当前 CronJob 的 apiVersion
kind: CronJob                 # <---------  当前资源的类型
metadata:
  name: hello
spec:
  schedule: "* * * * *"      # <---------  schedule 指定什么时候运行 Job,其格式与 Linux crontab 一致,这里 * * * * * 的含义是每一分钟启动一次
  jobTemplate:               # <---------  定义 Job 的模板,格式与前面 Job 一致
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            command: ["echo","boge like cronjob."]
          restartPolicy: OnFailure

正常创建后,我们过几分钟来看看运行结果

# 这里会显示cronjob的综合信息
# kubectl get cronjobs.batch 
NAME    SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
hello   * * * * *   False     0        66s             2m20s

# 可以看到它每隔一分钟就会创建一个pod来执行job任务
# kubectl get pod
NAME                     READY   STATUS              RESTARTS   AGE
hello-1610267460-9b6hp   0/1     Completed           0          2m5s
hello-1610267520-fm427   0/1     Completed           0          65s
hello-1610267580-v8g4h   0/1     ContainerCreating   0          5s

# 测试完成后删掉这个资源
# kubectl delete cronjobs.batch hello 
cronjob.batch "hello" deleted

cronjob定时任务在生产中的用处很多,这也是为什么上面job我说用得很少的缘故,我们可以把一些需要定时定期运行的任务,在K8s上以cronjob运行,依托K8s强大的资源调度以及服务自愈能力,我们可以放心的把定时任务交给它执行。

第11关 k8s架构师课程之RBAC角色访问控制

原创2021-03-22 21:30·博哥爱运维

RBAC

大家好,我是博哥爱运维,在k8s上我们如何控制访问权限呢,答案就是Role-based access control (RBAC) - 基于角色(Role)的访问控制,(RBAC)是一种基于组织中用户的角色来调节控制对 计算机或网络资源的访问的方法。

在早期的K8s版本,RBAC还未出现的时候,整个K8s的安全是较为薄弱的,有了RBAC后,我们可以对K8s集群的访问人员作非常明细化的控制,控制他们能访问什么资源,以只读还是可以读写的形式来访问,目前RBAC是K8s默认的安全授权标准,所以我们非常有必要来掌握RBAC的使用,这样才有更有力的保障我们的K8s集群的安全使用,下面我们将以生产中的实际使用来大家了解及掌握RBAC的生产应用。

RBAC里面的几种资源关系图,下面将用下面的资源来演示生产中经典的RBAC应用

                  |--- Role --- RoleBinding                只在指定namespace中生效
ServiceAccount ---|
                  |--- ClusterRole --- ClusterRoleBinding  不受namespace限制,在整个K8s集群中生效

在我看来,RBAC在K8s上的用途主要分为两大类:

第一类是保证在K8s上运行的pod服务具有相应的集群权限,如gitlab的CI/CD,它需要能访问除自身以外其他pod,比如gitlab-runner的pod的权限,再比如gitlab-runner的pod需要拥有创建新的临时pod的权限,用以来构建CI/CD自动化流水线,这里大家没用过不懂没关系,先简单了解下就可以了,在本课程后面基于K8s及gitlab的生产实战CI/CD内容会给大家作详细实战讲解;

第二类是创建能访问K8s相应资源、拥有对应权限的kube-config配置给到使用K8s的人员,来作为连接K8s的授权凭证

第一类的实战这里先暂时以早期的helm2来作下讲解,helm是一个快捷安装K8s各类资源的管理工具,通过之前给大家讲解的,一个较为完整的服务可能会存在deployment,service,configmap,secret,ingress等资源来组合使用,大家在用的过程中可能会觉得配置使用较为麻烦,这时候helm就出现了,它把这些资源都打包封装成它自己能识别的内容,我们在安装一个服务的时候,就只需要作下简单的配置,一条命令即可完成上述众多资源的配置安装,titller相当于helm的服务端,它是需要有权限在K8s中创建各类资源的,在初始安装使用时,如果没有配置RBAC权限,我们会看到如下报错:

root@node1:~# helm install stable/mysql
Error: no available release name found

这时,我们可以来快速解决这个问题,创建sa关联K8s自带的最高权限的ClusterRole(生产中建议不要这样做,权限太高有安全隐患,这个就和linux的root管理帐号一样,一般都是建议通过sudo来控制帐号权限)

kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

第二类,我这里就直接以我在生产中实施的完整脚本来做讲解及实战,相信会给大家带来一个全新的学习感受,并能很快掌握它们:

  1. 创建对指定namespace有所有权限的kube-config
#!/bin/bash
#
# This Script based on  https://jeremievallee.com/2018/05/28/kubernetes-rbac-namespace-user.html
# K8s'RBAC doc:         https://kubernetes.io/docs/reference/access-authn-authz/rbac
# Gitlab'CI/CD doc:     hhttps://docs.gitlab.com/ee/user/permissions.html#running-pipelines-on-protected-branches
#
# In honor of the remarkable Windson

BASEDIR="$(dirname "$0")"
folder="$BASEDIR/kube_config"

echo -e "All namespaces is here: \n$(kubectl get ns|awk 'NR!=1{print $1}')"
echo "endpoint server if local network you can use $(kubectl cluster-info |awk '/Kubernetes/{print $NF}')"

namespace=$1
endpoint=$(echo "$2" | sed -e 's,https\?://,,g')

if [[ -z "$endpoint" || -z "$namespace" ]]; then
    echo "Use "$(basename "$0")" NAMESPACE ENDPOINT";
    exit 1;
fi

if ! kubectl get ns|awk 'NR!=1{print $1}'|grep -w "$namespace";then kubectl create ns "$namespace";else echo "namespace: $namespace was exist." ;fi

echo "---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: $namespace-user
  namespace: $namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: $namespace-user-full-access
  namespace: $namespace
rules:
- apiGroups: ['', 'extensions', 'apps', 'metrics.k8s.io']
  resources: ['*']
  verbs: ['*']
- apiGroups: ['batch']
  resources:
  - jobs
  - cronjobs
  verbs: ['*']
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: $namespace-user-view
  namespace: $namespace
subjects:
- kind: ServiceAccount
  name: $namespace-user
  namespace: $namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: $namespace-user-full-access
---
# https://kubernetes.io/zh/docs/concepts/policy/resource-quotas/
apiVersion: v1
kind: ResourceQuota
metadata:
  name: $namespace-compute-resources
  namespace: $namespace
spec:
  hard:
    pods: "10"
    services: "10"
    persistentvolumeclaims: "5"
    requests.cpu: "1"
    requests.memory: 2Gi
    limits.cpu: "2"
    limits.memory: 4Gi" | kubectl apply -f -
kubectl -n $namespace describe quota $namespace-compute-resources
mkdir -p $folder
tokenName=$(kubectl get sa $namespace-user -n $namespace -o "jsonpath={.secrets[0].name}")
token=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data.token}" | base64 --decode)
certificate=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data['ca\.crt']}")

echo "apiVersion: v1
kind: Config
preferences: {}
clusters:
- cluster:
    certificate-authority-data: $certificate
    server: https://$endpoint
  name: $namespace-cluster
users:
- name: $namespace-user
  user:
    as-user-extra: {}
    client-key-data: $certificate
    token: $token
contexts:
- context:
    cluster: $namespace-cluster
    namespace: $namespace
    user: $namespace-user
  name: $namespace
current-context: $namespace" > $folder/$namespace.kube.conf
  1. 创建对指定namespace有所有权限的kube-config(在已有的namespace中创建)
#!/bin/bash


BASEDIR="$(dirname "$0")"
folder="$BASEDIR/kube_config"

echo -e "All namespaces is here: \n$(kubectl get ns|awk 'NR!=1{print $1}')"
echo "endpoint server if local network you can use $(kubectl cluster-info |awk '/Kubernetes/{print $NF}')"

namespace=$1
endpoint=$(echo "$2" | sed -e 's,https\?://,,g')

if [[ -z "$endpoint" || -z "$namespace" ]]; then
    echo "Use "$(basename "$0")" NAMESPACE ENDPOINT";
    exit 1;
fi


echo "---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: $namespace-user
  namespace: $namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: $namespace-user-full-access
  namespace: $namespace
rules:
- apiGroups: ['', 'extensions', 'apps', 'metrics.k8s.io']
  resources: ['*']
  verbs: ['*']
- apiGroups: ['batch']
  resources:
  - jobs
  - cronjobs
  verbs: ['*']
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: $namespace-user-view
  namespace: $namespace
subjects:
- kind: ServiceAccount
  name: $namespace-user
  namespace: $namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: $namespace-user-full-access" | kubectl apply -f -

mkdir -p $folder
tokenName=$(kubectl get sa $namespace-user -n $namespace -o "jsonpath={.secrets[0].name}")
token=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data.token}" | base64 --decode)
certificate=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data['ca\.crt']}")

echo "apiVersion: v1
kind: Config
preferences: {}
clusters:
- cluster:
    certificate-authority-data: $certificate
    server: https://$endpoint
  name: $namespace-cluster
users:
- name: $namespace-user
  user:
    as-user-extra: {}
    client-key-data: $certificate
    token: $token
contexts:
- context:
    cluster: $namespace-cluster
    namespace: $namespace
    user: $namespace-user
  name: $namespace
current-context: $namespace" > $folder/$namespace.kube.conf
  1. 同上,创建只读权限的
#!/bin/bash


BASEDIR="$(dirname "$0")"
folder="$BASEDIR/kube_config"

echo -e "All namespaces is here: \n$(kubectl get ns|awk 'NR!=1{print $1}')"
echo "endpoint server if local network you can use $(kubectl cluster-info |awk '/Kubernetes/{print $NF}')"

namespace=$1
endpoint=$(echo "$2" | sed -e 's,https\?://,,g')

if [[ -z "$endpoint" || -z "$namespace" ]]; then
    echo "Use "$(basename "$0")" NAMESPACE ENDPOINT";
    exit 1;
fi


echo "---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: $namespace-user-readonly
  namespace: $namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: $namespace-user-readonly-access
  namespace: $namespace
rules:
- apiGroups: ['', 'extensions', 'apps', 'metrics.k8s.io']
  resources: ['pods', 'pods/log']
  verbs: ['get', 'list', 'watch']
- apiGroups: ['batch']
  resources: ['jobs', 'cronjobs']
  verbs: ['get', 'list', 'watch']
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: $namespace-user-view-readonly
  namespace: $namespace
subjects:
- kind: ServiceAccount
  name: $namespace-user-readonly
  namespace: $namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: $namespace-user-readonly-access" | kubectl apply -f -

mkdir -p $folder
tokenName=$(kubectl get sa $namespace-user-readonly -n $namespace -o "jsonpath={.secrets[0].name}")
token=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data.token}" | base64 --decode)
certificate=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data['ca\.crt']}")

echo "apiVersion: v1
kind: Config
preferences: {}
clusters:
- cluster:
    certificate-authority-data: $certificate
    server: https://$endpoint
  name: $namespace-cluster-readonly
users:
- name: $namespace-user-readonly
  user:
    as-user-extra: {}
    client-key-data: $certificate
    token: $token
contexts:
- context:
    cluster: $namespace-cluster-readonly
    namespace: $namespace
    user: $namespace-user-readonly
  name: $namespace
current-context: $namespace" > $folder/$namespace-readonly.kube.conf

最后,来一个多个集群配置融合的创建,这个在多集群管理方面非常有用,这里只以创建只读权限配置作为演示

#!/bin/bash
# describe: create k8s cluster all namespaces resources with readonly clusterrole, no exec 、delete ...

# look system default to study:
# kubectl describe clusterrole view

# restore all change:
#kubectl -n kube-system delete sa all-readonly-${clustername}
#kubectl delete clusterrolebinding all-readonly-${clustername}
#kubectl delete clusterrole all-readonly-${clustername}


clustername=$1

Help(){
    echo "Use "$(basename "$0")" ClusterName(example: k8s1|k8s2|k8s3|delk8s1|delk8s2|delk8s3|3in1)";
    exit 1;
}

if [[ -z "${clustername}" ]]; then
    Help
fi

case ${clustername} in
    k8s1)
    endpoint="https://x.x.x.x:123456"
    ;;
    k8s2)
    endpoint="https://x.x.x.x:123456"
    ;;
    k8s3)
    endpoint="https://x.x.x.x:123456"
    ;;
    delk8s1)
    kubectl -n kube-system delete sa all-readonly-k8s1
    kubectl delete clusterrolebinding all-readonly-k8s1
    kubectl delete clusterrole all-readonly-k8s1
    echo "${clustername} successful."
    exit 0
    ;;
    delk8s2)
    kubectl -n kube-system delete sa all-readonly-k8s2
    kubectl delete clusterrolebinding all-readonly-k8s2
    kubectl delete clusterrole all-readonly-k8s2
    echo "${clustername} successful."
    exit 0
    ;;
    delk8s3)
    kubectl -n kube-system delete sa all-readonly-k8s3
    kubectl delete clusterrolebinding all-readonly-k8s3
    kubectl delete clusterrole all-readonly-k8s3
    echo "${clustername} successful."
    exit 0
    ;;
    3in1)
    KUBECONFIG=./all-readonly-k8s1.conf:all-readonly-k8s2.conf:all-readonly-k8s3.conf kubectl config view --flatten > ./all-readonly-3in1.conf
    kubectl --kubeconfig=./all-readonly-3in1.conf config use-context "k8s3"
    kubectl --kubeconfig=./all-readonly-3in1.conf config set-context "k8s3" --namespace="default"
    kubectl --kubeconfig=./all-readonly-3in1.conf config get-contexts
    echo -e "\n\n\n"
    cat ./all-readonly-3in1.conf |base64 -w 0
    exit 0
    ;;
    *)
    Help
esac

echo "---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: all-readonly-${clustername}
rules:
- apiGroups:
  - ''
  resources:
  - configmaps
  - endpoints
  - persistentvolumes
  - persistentvolumeclaims
  - pods
  - replicationcontrollers
  - replicationcontrollers/scale
  - serviceaccounts
  - services
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ''
  resources:
  - bindings
  - events
  - limitranges
  - namespaces/status
  - pods/log
  - pods/status
  - replicationcontrollers/status
  - resourcequotas
  - resourcequotas/status
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ''
  resources:
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - controllerrevisions
  - daemonsets
  - deployments
  - deployments/scale
  - replicasets
  - replicasets/scale
  - statefulsets
  - statefulsets/scale
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - deployments/scale
  - ingresses
  - networkpolicies
  - replicasets
  - replicasets/scale
  - replicationcontrollers/scale
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  verbs:
  - get
  - list
  - watch" | kubectl apply -f -

kubectl -n kube-system create sa all-readonly-${clustername}
kubectl create clusterrolebinding all-readonly-${clustername} --clusterrole=all-readonly-${clustername} --serviceaccount=kube-system:all-readonly-${clustername}

tokenName=$(kubectl -n kube-system get sa all-readonly-${clustername} -o "jsonpath={.secrets[0].name}")
token=$(kubectl -n kube-system get secret $tokenName -o "jsonpath={.data.token}" | base64 --decode)
certificate=$(kubectl -n kube-system get secret $tokenName -o "jsonpath={.data['ca\.crt']}")

echo "apiVersion: v1
kind: Config
preferences: {}
clusters:
- cluster:
    certificate-authority-data: $certificate
    server: $endpoint
  name: all-readonly-${clustername}-cluster
users:
- name: all-readonly-${clustername}
  user:
    as-user-extra: {}
    client-key-data: $certificate
    token: $token
contexts:
- context:
    cluster: all-readonly-${clustername}-cluster
    user: all-readonly-${clustername}
  name: ${clustername}
current-context: ${clustername}" > ./all-readonly-${clustername}.conf

这里大家如果没看明白也没关系,本节课程的视频讲解部分将在明天更新,大家记得先关注博哥下,不要错过后面精彩的内容。

第12关 k8s架构师课程之业务日志收集上节介绍、下节实战

原创2021-03-24 21:30·博哥爱运维

elasticsearch log-pilot kibana 、 prometheus-oprator grafana webhook

大家好,我是博哥爱运维。

OK,到目前为止,我们的服务顺利容器化并上了K8s,同时也能通过外部网络进行请求访问,相关的服务数据也能进行持久化存储了,那么接下来很关键的事情,就是怎么去收集服务产生的日志进行数据分析及问题排查,怎么去监控服务的运行是否正常,下面会以生产中的经验来详细讲解这两大块。

日志收集

现在市面上大多数课程都是以EFK来作来K8s项目的日志解决方案,它包括三个组件:Elasticsearch, Fluentd(filebeat), Kibana;Elasticsearch 是日志存储和日志搜索引擎,Fluentd 负责把k8s集群的日志发送给 Elasticsearch, Kibana 则是可视化界面查看和检索存储在 Elasticsearch 的数据。

但根据生产中实际使用情况来看,它有以下弊端:

1、日志收集系统 EFK是在每个kubernetes的NODE节点以daemonset的形式启动一个fluentd的pod,来收集NODE节点上的日志,如容器日志(/var/log/containers/*.log),但里面无法作细分,想要的和不想要的都收集进来了,带来的后面就是磁盘IO压力会比较大,日志过滤麻烦。

2、无法收集对应POD里面的业务日志 上面第1点只能收集pod的stdout日志,但是pod内如有需要收集的业务日志,像pod内的/tmp/datalog/*.log,那EFK是无能为力的,只能是在pod内启动多个容器(filebeat)去收集容器内日志,但这又会带来的是pod多容器性能的损耗,这个接下来会详细讲到。

3、fluentd的采集速率性能较低,只能不到filebeat的1/10的性能。

基于此,我通过调研发现了阿里开源的智能容器采集工具 Log-Pilot,github地址:
https://github.com/AliyunContainerService/log-pilot

下面以sidecar 模式和log-pilot这两种方式的日志收集形式做个详细对比说明:

第一种模式是 sidecar 模式,这种需要我们在每个 Pod 中都附带一个 logging 容器来进行本 Pod 内部容器的日志采集,一般采用共享卷的方式,但是对于这一种模式来说,很明显的一个问题就是占用的资源比较多,尤其是在集群规模比较大的情况下,或者说单个节点上容器特别多的情况下,它会占用过多的系统资源,同时也对日志存储后端占用过多的连接数。当我们的集群规模越大,这种部署模式引发的潜在问题就越大。

第12关 k8s架构师课程之业务日志收集上节介绍、下节实战

另一种模式是 Node 模式,这种模式是我们在每个 Node 节点上仅需部署一个 logging 容器来进行本 Node 所有容器的日志采集。这样跟前面的模式相比最明显的优势就是占用资源比较少,同样在集群规模比较大的情况下表现出的优势越明显,同时这也是社区推荐的一种模式。

第12关 k8s架构师课程之业务日志收集上节介绍、下节实战

经过多方面测试,log-pilot对现有业务pod侵入性很小,只需要在原有pod的内传入几行env环境变量,即可对此pod相关的日志进行收集,已经测试了后端接收的工具有logstash、elasticsearch、kafka、redis、file,均OK,下面开始部署整个日志收集环境。

我们这里用一个tomcat服务来模拟业务服务,用log-pilot分别收集它的stdout以及容器内的业务数据日志文件到指定后端存储(这里分别以elasticsearch、kafka的这两种企业常用的接收工具来做示例)

准备好相应的yaml配置

vim tomcat-test.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: tomcat
  name: tomcat
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tomcat
  template:
    metadata:
      labels:
        app: tomcat
    spec:
      tolerations:
      - key: "node-role.kubernetes.io/master"
        effect: "NoSchedule"
      containers:
      - name: tomcat
        image: "tomcat:7.0"
        env:      # 注意点一,添加相应的环境变量(下面收集了两块日志1、stdout 2、/usr/local/tomcat/logs/catalina.*.log)
        - name: aliyun_logs_tomcat-syslog   # 如日志发送到es,那index名称为 tomcat-syslog
          value: "stdout"
        - name: aliyun_logs_tomcat-access   # 如日志发送到es,那index名称为 tomcat-access
          value: "/usr/local/tomcat/logs/catalina.*.log"
        volumeMounts:   # 注意点二,对pod内要收集的业务日志目录需要进行共享,可以收集多个目录下的日志文件
          - name: tomcat-log
            mountPath: /usr/local/tomcat/logs
      volumes:
        - name: tomcat-log
          emptyDir: {}

vim elasticsearch.6.8.13-statefulset.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    k8s-app: elasticsearch-logging
    version: v6.8.13
  name: elasticsearch-logging
  # namespace: logging
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: elasticsearch-logging
      version: v6.8.13
  serviceName: elasticsearch-logging
  template:
    metadata:
      labels:
        k8s-app: elasticsearch-logging
        version: v6.8.13
    spec:
      nodeSelector:
        esnode: "true"  ## 注意给想要运行到的node打上相应labels
      containers:
      - env:
        - name: NAMESPACE
          valueFrom:
            fieldRee:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: cluster.name
          value: elasticsearch-logging-0
        - name: ES_JAVA_OPTS
          value: "-Xms512m -Xmx512m"
        image: elastic/elasticsearch:6.8.13
        name: elasticsearch-logging
        ports:
        - containerPort: 9200
          name: db
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: elasticsearch-logging
      dnsConfig:
        options:
        - name: single-request-reopen
      initContainers:
      - command:
        - /sbin/sysctl
        - -w
        - vm.max_map_count=262144
        image: alpine:3.12
        imagePullPolicy: IfNotPresent
        name: elasticsearch-logging-init
        resources: {}
        securityContext:
          privileged: true
      - name: fix-permissions
        image: alpine:3.12
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: elasticsearch-logging
          mountPath: /usr/share/elasticsearch/data
      volumes:
      - name: elasticsearch-logging
        hostPath:
          path: /esdata
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: elasticsearch-logging
  name: elasticsearch
  # namespace: logging
spec:
  ports:
  - port: 9200
    protocol: TCP
    targetPort: db
  selector:
    k8s-app: elasticsearch-logging
  type: ClusterIP

vim kibana.6.8.13.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  # namespace: logging
  labels:
    app: kibana
spec:
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: elastic/kibana:6.8.13
        resources:
          limits:
            cpu: 1000m
          requests:
            cpu: 100m
        env:
          - name: ELASTICSEARCH_URL
            value: http://elasticsearch:9200
        ports:
        - containerPort: 5601
---
apiVersion: v1
kind: Service
metadata:
  name: kibana
  # namespace: logging
  labels:
    app: kibana
spec:
  ports:
  - port: 5601
    protocol: TCP
    targetPort: 5601
  type: ClusterIP
  selector:
    app: kibana
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kibana
  # namespace: logging
spec:
  rules:
  - host: kibana.boge.com
    http:
      paths:
      - path: /
        backend:
          serviceName: kibana
          servicePort: 5601

vim log-pilot.yml # 后端输出的elasticsearch

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-pilot
  labels:
    app: log-pilot
  # 设置期望部署的namespace
#  namespace: ns-elastic
spec:
  selector:
    matchLabels:
      app: log-pilot
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: log-pilot
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      # 是否允许部署到Master节点上
      #tolerations:
      #- key: node-role.kubernetes.io/master
      #  effect: NoSchedule
      containers:
      - name: log-pilot
        # 版本请参考https://github.com/AliyunContainerService/log-pilot/releases
        image: registry.cn-hangzhou.aliyuncs.com/acs/log-pilot:0.9.7-filebeat
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 200m
            memory: 200Mi
        env:
          - name: "NODE_NAME"
            valueFrom:
              fieldRee:
                fieldPath: spec.nodeName
          ##--------------------------------
#          - name: "LOGGING_OUTPUT"
#            value: "logstash"
#          - name: "LOGSTASH_HOST"
#            value: "logstash-g1"
#          - name: "LOGSTASH_PORT"
#            value: "5044"
          ##--------------------------------
          - name: "LOGGING_OUTPUT"
            value: "elasticsearch"
          ## 请确保集群到ES网络可达
          - name: "ELASTICSEARCH_HOSTS"
            value: "elasticsearch:9200"
          ## 配置ES访问权限
          #- name: "ELASTICSEARCH_USER"
          #  value: "{es_username}"
          #- name: "ELASTICSEARCH_PASSWORD"
          #  value: "{es_password}"
          ##--------------------------------
          ## https://github.com/AliyunContainerService/log-pilot/blob/master/docs/filebeat/docs.md
          ## to file need configure 1
#          - name: LOGGING_OUTPUT
#            value: file
#          - name: FILE_PATH
#            value: /tmp
#          - name: FILE_NAME
#            value: filebeat.log
        volumeMounts:
        - name: sock
          mountPath: /var/run/docker.sock
        - name: root
          mountPath: /host
          readOnly: true
        - name: varlib
          mountPath: /var/lib/filebeat
        - name: varlog
          mountPath: /var/log/filebeat
        - name: localtime
          mountPath: /etc/localtime
          readOnly: true
         ## to file need configure 2
#        - mountPath: /tmp
#          name: mylog
        livenessProbe:
          failureThreshold: 3
          exec:
            command:
            - /pilot/healthz
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        securityContext:
          capabilities:
            add:
            - SYS_ADMIN
      terminationGracePeriodSeconds: 30
      volumes:
      - name: sock
        hostPath:
          path: /var/run/docker.sock
      - name: root
        hostPath:
          path: /
      - name: varlib
        hostPath:
          path: /var/lib/filebeat
          type: DirectoryOrCreate
      - name: varlog
        hostPath:
          path: /var/log/filebeat
          type: DirectoryOrCreate
      - name: localtime
        hostPath:
          path: /etc/localtime
       ## to file need configure 3
#      - hostPath:
#          path: /tmp/mylog
#          type: ""
#        name: mylog

vim log-pilot2-kafka.yaml #后端输出到kafka

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: log-pilot2-configuration
  #namespace: ns-elastic
data:
  logging_output: "kafka"
  kafka_brokers: "10.0.1.204:9092"
  kafka_version: "0.10.0"
  # configure all valid topics in kafka
  # when disable auto-create topic
  kafka_topics: "tomcat-syslog,tomcat-access"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-pilot2
  #namespace: ns-elastic
  labels:
    k8s-app: log-pilot2
spec:
  selector:
    matchLabels:
      k8s-app: log-pilot2
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: log-pilot2
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: log-pilot2
#
#        wget https://github.com/AliyunContainerService/log-pilot/archive/v0.9.7.zip
#        unzip log-pilot-0.9.7.zip
#        vim ./log-pilot-0.9.7/assets/filebeat/config.filebeat
#        ...
#        output.kafka:
#            hosts: [$KAFKA_BROKERS]
#            topic: '%{[topic]}'
#            codec.format:
#                string: '%{[message]}'
#        ...
        image: registry.cn-hangzhou.aliyuncs.com/acs/log-pilot:0.9.7-filebeat
        env:
          - name: "LOGGING_OUTPUT"
            valueFrom:
              configMapKeyRee:
                name: log-pilot2-configuration
                key: logging_output
          - name: "KAFKA_BROKERS"
            valueFrom:
              configMapKeyRee:
                name: log-pilot2-configuration
                key: kafka_brokers
          - name: "KAFKA_VERSION"
            valueFrom:
              configMapKeyRee:
                name: log-pilot2-configuration
                key: kafka_version
          - name: "NODE_NAME"
            valueFrom:
              fieldRee:
                fieldPath: spec.nodeName
        volumeMounts:
        - name: sock
          mountPath: /var/run/docker.sock
        - name: logs
          mountPath: /var/log/filebeat
        - name: state
          mountPath: /var/lib/filebeat
        - name: root
          mountPath: /host
          readOnly: true
        - name: localtime
          mountPath: /etc/localtime
        # configure all valid topics in kafka
        # when disable auto-create topic
        - name: config-volume
          mountPath: /etc/filebeat/config
        securityContext:
          capabilities:
            add:
            - SYS_ADMIN
      terminationGracePeriodSeconds: 30
      volumes:
      - name: sock
        hostPath:
          path: /var/run/docker.sock
          type: Socket
      - name: logs
        hostPath:
          path: /var/log/filebeat
          type: DirectoryOrCreate
      - name: state
        hostPath:
          path: /var/lib/filebeat
          type: DirectoryOrCreate
      - name: root
        hostPath:
          path: /
          type: Directory
      - name: localtime
        hostPath:
          path: /etc/localtime
          type: File
      # kubelet sync period
      - name: config-volume
        configMap:
          name: log-pilot2-configuration
          items:
          - key: kafka_topics
            path: kafka_topics

准备一个测试用的kafka服务

# 部署前准备
# 0. 先把代码pull到本地
# https://github.com/wurstmeister/kafka-docker
# 修改docker-compose.yml为:
#——------------------------------
version: '2'
services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - "2181:2181"
  kafka:
    #build: .
    image: wurstmeister/kafka
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_HOST_NAMe: 10.0.1.204  # docker运行的机器IP
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /nfs_storageclass/kafka:/kafka
#——------------------------------

# 1. docker-compose setup:
# docker-compose up -d
Recreating kafka-docker-compose_kafka_1   ... done
Starting kafka-docker-compose_zookeeper_1 ... done

# 2. result look:
# docker-compose ps
              Name                            Command               State                    Ports                  
--------------------------------------------------------------------------------------------------------------------
kafka-docker-compose_kafka_1       start-kafka.sh                   Up      0.0.0.0:9092->9092/tcp                  
kafka-docker-compose_zookeeper_1   /bin/sh -c /usr/sbin/sshd  ...   Up      0.0.0.0:2181->2181/tcp, 22/tcp,         
                                                                            2888/tcp, 3888/tcp                      
# 3. run test-docker
bash-4.4# docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -e HOST_IP=10.0.1.204 -e ZK=10.0.1.204:2181 -i -t wurstmeister/kafka /bin/bash

# 4. list topic
bash-4.4# kafka-topics.sh --zookeeper 10.0.1.204:2181 --list
tomcat-access
tomcat-syslog

# 5. consumer topic data:
bash-4.4# kafka-console-consumer.sh --bootstrap-server 10.0.1.204:9092 --topic tomcat-access --from-beginning

先创建一个测试用的命名空间:

# kubectl create ns testlog

收集日志到elasticserach

部署es 和 kibana:

# kubectl -n testlog apply -f elasticsearch.6.8.13-statefulset.yaml
# kubectl -n testlog apply -f kibana.6.8.13.yaml

部署log-pilot

# kubectl -n testlog apply -f log-pilot.yml

部署tomcat

# kubectl -n testlog apply -f tomcat-test.yaml

然后通过kibana域名kibana.boge.com来创建索引,查看日志是否已经被收集到es了。

收集日志到Kafka

# 先清除上面的配置
# kubectl -n testlog delete -f elasticsearch.6.8.13-statefulset.yaml
# kubectl -n testlog delete -f kibana.6.8.13.yaml
# kubectl -n testlog delete -f log-pilot.yml
# kubectl -n testlog delete -f tomcat-test.yaml

# 然后部署新配置的log-pilot以及测试kafka
# kubectl -n testlog apply -f log-pilot2-kafka.yaml
# 注意修改里面的configmap配置

apiVersion: v1
kind: ConfigMap
metadata:
  name: log-pilot2-configuration
data:
  logging_output: "kafka"            # 指定输出到kafka服务
  kafka_brokers: "10.0.1.204:9092"   # kafka地址
  kafka_version: "0.10.0"            # 指定版本,生产中实测kafka_2.12-2.5.0及以下,这里都配置为"0.10.0" 就可以了
  # configure all valid topics in kafka
  # when disable auto-create topic
  kafka_topics: "tomcat-syslog,tomcat-access"  # 注意这里要把需要收集的服务topic给加到白名单这里,否则不给收集,而且同个服务里面都要加,否则都不会被收集
  
# kafka的部署配置上面已经贴出

# 重新部署下tomcat
# kubectl -n testlog delete -f tomcat-test.yaml

# 然后在kafka里面只要能列出相应topic并能对其进行消费就证明日志收集没问题了
tomcat-access
tomcat-syslog

在实际生产环境中,我们的业务日志可能会非常多,这时候建议收集时直接先缓存到KAFKA,然后根据后面我们的实际需求来消费KAFKA里面的日志数据,转存到其他地方,这里接上面继续,我以一个logstash来收集KAFKA里面的日志数据到最新版本的elasticsearch里面(正好也解决了log-pilot不支持elasticsearch7以上版本的问题)

完整的日志收集架构链路如下:

log-pilot ---> kafka ---> logstash ---> elasticsearch7 ---> kibana7

开始部署新版本的es集群(开启了X-Pack安全配置):

注:这里为了节省资源,可以把上面创建的6版本的es和kibana给清理掉

# 部署es集群
kubectl apply -f namespace.yaml
kubectl apply -f es-master-configmap.yaml -f es-master-service.yaml -f es-master-deployment.yaml
kubectl apply -f es-data-configmap.yaml -f es-data-service.yaml -f es-data-statefulset.yaml
kubectl apply -f es-client-configmap.yaml -f es-client-service.yaml -f es-client-deployment.yaml


# 查看es是否正常运行
kubectl logs -f -n logging $(kubectl get pods -n logging | grep elasticsearch-master | sed -n 1p | awk '{print $1}') \
| grep "Cluster health status changed from \[YELLOW\] to \[GREEN\]"


# 查看生成的es相关帐号密码(这里只用到了帐号elastic)
# kubectl exec -it $(kubectl get pods -n logging | grep elasticsearch-client | sed -n 1p | awk '{print $1}') -n logging -- bin/elasticsearch-setup-passwords auto -b
Changed password for user apm_system
PASSWORD apm_system = py0fZjuCP2Ky3ysc3TW6

Changed password for user kibana_system
PASSWORD kibana_system = ltgJNL8dw1nF34WLw9cQ

Changed password for user kibana
PASSWORD kibana = ltgJNL8dw1nF34WLw9cQ

Changed password for user logstash_system
PASSWORD logstash_system = biALFG4UYoc8h4TSxAJC

Changed password for user beats_system
PASSWORD beats_system = E5TdQ8LI33mTQRrJlD3r

Changed password for user remote_monitoring_user
PASSWORD remote_monitoring_user = GtlT45XgtVvT5KpAp791

Changed password for user elastic
PASSWORD elastic = cR5SFjHajVOoGZPzfiEQ


# 创建elastic帐号的密码secret
kubectl create secret generic elasticsearch-pw-elastic -n logging --from-literal password=cR5SFjHajVOoGZPzfiEQ


# 创建kibana
kubectl apply  -f kibana-configmap.yaml -f kibana-service.yaml -f kibana-deployment.yaml

相关yaml配置如下:

namespace.yaml

kind: Namespace
apiVersion: v1
metadata:
  name: logging

es-client-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: logging
  name: elasticsearch-client-config
  labels:
    app: elasticsearch
    role: client
data:
  elasticsearch.yml: |-
    cluster.name: ${CLUSTER_NAME}
    node.name: ${NODE_NAME}
    discovery.seed_hosts: ${NODE_LIST}
    cluster.initial_master_nodes: ${MASTER_NODES}
    network.host: 0.0.0.0
    node:
      master: false
      data: false
      ingest: true
    xpack.security.enabled: true
    xpack.monitoring.collection.enabled: true

es-client-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: logging
  name: elasticsearch-client
  labels:
    app: elasticsearch
    role: client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
      role: client
  template:
    metadata:
      labels:
        app: elasticsearch
        role: client
    spec:
      containers:
      - name: elasticsearch-client
        image: elastic/elasticsearch:7.10.1
        env:
        - name: CLUSTER_NAME
          value: elasticsearch
        - name: NODE_NAME
          value: elasticsearch-client
        - name: NODE_LIST
          value: elasticsearch-master,elasticsearch-data,elasticsearch-client
        - name: MASTER_NODES
          value: elasticsearch-master
        - name: "ES_JAVA_OPTS"
          value: "-Xms256m -Xmx256m"
        ports:
        - containerPort: 9200
          name: client
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: config
          mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          readOnly: true
          subPath: elasticsearch.yml
        - name: storage
          mountPath: /data
      volumes:
      - name: config
        configMap:
          name: elasticsearch-client-config
      - name: "storage"
        emptyDir:
          medium: ""
      initContainers:
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true

es-client-service.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: logging
  name: elasticsearch-client
  labels:
    app: elasticsearch
    role: client
spec:
  ports:
  - port: 9200
    name: client
  - port: 9300
    name: transport
  selector:
    app: elasticsearch
    role: client

es-data-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: logging
  name: elasticsearch-data-config
  labels:
    app: elasticsearch
    role: data
data:
  elasticsearch.yml: |-
    cluster.name: ${CLUSTER_NAME}
    node.name: ${NODE_NAME}
    discovery.seed_hosts: ${NODE_LIST}
    cluster.initial_master_nodes: ${MASTER_NODES}
    network.host: 0.0.0.0
    node:
      master: false
      data: true
      ingest: false
    xpack.security.enabled: true
    xpack.monitoring.collection.enabled: true

es-data-service.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: logging
  name: elasticsearch-data
  labels:
    app: elasticsearch
    role: data
spec:
  ports:
  - port: 9300
    name: transport
  selector:
    app: elasticsearch
    role: data

es-data-statefulset.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  namespace: logging
  name: elasticsearch-data
  labels:
    app: elasticsearch
    role: data
spec:
  serviceName: "elasticsearch-data"
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch-data
  template:
    metadata:
      labels:
        app: elasticsearch-data
        role: data
    spec:
      containers:
      - name: elasticsearch-data
        image: elastic/elasticsearch:7.10.1
        env:
        - name: CLUSTER_NAME
          value: elasticsearch
        - name: NODE_NAME
          value: elasticsearch-data
        - name: NODE_LIST
          value: elasticsearch-master,elasticsearch-data,elasticsearch-client
        - name: MASTER_NODES
          value: elasticsearch-master
        - name: "ES_JAVA_OPTS"
          value: "-Xms300m -Xmx300m"
        ports:
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: config
          mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          readOnly: true
          subPath: elasticsearch.yml
        - name: elasticsearch-data-persistent-storage
          mountPath: /data/db
      volumes:
      - name: config
        configMap:
          name: elasticsearch-data-config
      initContainers:
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: elasticsearch-data-persistent-storage
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "nfs-boge"
      resources:
        requests:
          storage: 2Gi

es-master-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: logging
  name: elasticsearch-master-config
  labels:
    app: elasticsearch
    role: master
data:
  elasticsearch.yml: |-
    cluster.name: ${CLUSTER_NAME}
    node.name: ${NODE_NAME}
    discovery.seed_hosts: ${NODE_LIST}
    cluster.initial_master_nodes: ${MASTER_NODES}
    network.host: 0.0.0.0
    node:
      master: true
      data: false
      ingest: false
    xpack.security.enabled: true
    xpack.monitoring.collection.enabled: true

es-master-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: logging
  name: elasticsearch-master
  labels:
    app: elasticsearch
    role: master
spec:
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
      role: master
  template:
    metadata:
      labels:
        app: elasticsearch
        role: master
    spec:
      containers:
      - name: elasticsearch-master
        image: elastic/elasticsearch:7.10.1
        env:
        - name: CLUSTER_NAME
          value: elasticsearch
        - name: NODE_NAME
          value: elasticsearch-master
        - name: NODE_LIST
          value: elasticsearch-master,elasticsearch-data,elasticsearch-client
        - name: MASTER_NODES
          value: elasticsearch-master
        - name: "ES_JAVA_OPTS"
          value: "-Xms256m -Xmx256m"
        ports:
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: config
          mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          readOnly: true
          subPath: elasticsearch.yml
        - name: storage
          mountPath: /data
      volumes:
      - name: config
        configMap:
          name: elasticsearch-master-config
      - name: "storage"
        emptyDir:
          medium: ""
      initContainers:
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true

es-master-service.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: logging
  name: elasticsearch-master
  labels:
    app: elasticsearch
    role: master
spec:
  ports:
  - port: 9300
    name: transport
  selector:
    app: elasticsearch
    role: master

kibana-configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: logging
  name: kibana-config
  labels:
    app: kibana
data:
  kibana.yml: |-
    server.host: 0.0.0.0
    elasticsearch:
      hosts: ${ELASTICSEARCH_URL}
      username: ${ELASTICSEARCH_USER}
      password: ${ELASTICSEARCH_PASSWORD}

kibana-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: logging
  name: kibana
  labels:
    app: kibana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: elastic/kibana:7.10.1
        env:
        - name: ELASTICSEARCH_URL
          value: "http://elasticsearch-client:9200"
        - name: ELASTICSEARCH_USER
          value: "elastic"
        - name: ELASTICSEARCH_PASSWORD
          valueFrom:
            secretKeyRee:
              name: elasticsearch-pw-elastic
              key: password
        resources:
          limits:
            cpu: 2
            memory: 1.5Gi
          requests:
            cpu: 0.5
            memory: 1Gi
        ports:
        - containerPort: 5601
          name: kibana
          protocol: TCP
        volumeMounts:
        - name: config
          mountPath: /usr/share/kibana/config/kibana.yml
          readOnly: true
          subPath: kibana.yml
      volumes:
      - name: config
        configMap:
          name: kibana-config

kibana-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: logging
  labels:
    component: kibana
spec:
  selector:
    app: kibana
  ports:
  - name: http
    port: 5601
    protocol: TCP
  type: NodePort

logstash.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: logging
  name: logstash-configmap
data:
  logstash.yml: |
    http.host: "0.0.0.0"
    path.config: /usr/share/logstash/pipeline
  logstash.cone: |
    # all input will come from filebeat, no local logs
    input {
      kafka {
        bootstrap_servers => ["10.0.0.221:9092"]
        # bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
        topics_pattern  => "tomcat-.*"
        consumer_threads => 5
        decorate_events => true
        codec => json
        auto_offset_reset => "latest"
        group_id => "boge"
      }
    }
    filter {
    }
    output {
      elasticsearch {
        index => "%{[@metadata][kafka][topic]}-%{+YYYY-MM-dd}" 
        hosts => [ "${ELASTICSEARCH_URL}" ]
        user => "${ELASTICSEARCH_USER}"
        password => "${ELASTICSEARCH_PASSWORD}"
        #cacert => '/etc/logstash/certificates/ca.crt'
      }
      stdout {
        codec => rubydebug
      }
    }

---
#apiVersion: v1
#kind: Service
#metadata:
#  labels:
#    app: logstash
#  name: logstash
#spec:
#  ports:
#  - name: "5044"
#    port: 5044
#    targetPort: 5044
#  selector:
#    app: logstash

---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: logging
  name: logstash
spec:
  replicas: 1
  selector:
    matchLabels:
      app: logstash
  template:
    metadata:
      labels:
        app: logstash
    spec:
      containers:
      - name: logstash
        image: elastic/logstash:7.10.1
        ports:
        - containerPort: 5044
          name: logstash
        env:
        - name: ELASTICSEARCH_URL
          value: "http://elasticsearch-client:9200"
        - name: ELASTICSEARCH_USER
          value: "elastic"
        - name: ELASTICSEARCH_PASSWORD
          valueFrom:
            secretKeyRee:
              name: elasticsearch-pw-elastic
              key: password
        volumeMounts:
        - name: config-volume
          mountPath: /usr/share/logstash/config
        - name: logstash-pipeline-volume
          mountPath: /usr/share/logstash/pipeline/
        - name: localtime
          mountPath: /etc/localtime
    #    - name: cert-ca
    #      mountPath: "/etc/logstash/certificates"
    #      readOnly: true
        command:
        - logstash
      volumes:
      - name: config-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.yml
              path: logstash.yml
      - name: logstash-pipeline-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.conf
              path: logstash.conf
      - name: localtime
        hostPath:
          path: /etc/localtime
          type: File

#      - name: cert-ca
#        secret:
#          secretName: elasticsearch-es-http-certs-public

第13关 k8s架构师课程之私有镜像仓库-Harbor

原创2021-03-26 17:36·博哥爱运维

大家好,我是博哥爱运维。在前面的十几关里面,博哥在k8s上部署服务一直都是用的dockerhub上的公有镜像,对于企业服务来说,有些我们是不想把服务镜像放在公网上面的; 同时如果在有内部的镜像仓库,那拉取镜像的速度就会很快,这时候就需要我们来部署公司内部的私有镜像仓库了,这里博哥会使用我们最常用的harbor来部署我们内部的私有镜像仓库。

harbor官方文档:
https://goharbor.io/docs/2.2.0/

harbor内部架构图

第13关 k8s架构师课程之私有镜像仓库-Harbor

harbor在我们这次课程的最终一关的位置如图所示:

第13关 k8s架构师课程之私有镜像仓库-Harbor

在生产中安装一般有两种方式,一种是用docker-compose启动官方打包好的离线安装包; 二上用helm chart的形式在k8s上来运行harbor,两种方式都可以用,但根据博哥的工作经验,建议是不要将harbor部署在k8s上,这里博哥就直接以第一种离线的方式来安装harbor

# 离线形式安装harbor私有镜像仓库

## 创建目录及下载harbor离线包
mkdir /data && cd /data
wget https://github.com/goharbor/harbor/releases/download/v2.2.0/harbor-offline-installer-v2.2.0.tgz
tar xf harbor-offline-installer-v2.2.0.tgz && rm harbor-offline-installer-v2.2.0.tgz

## 修改harbor配置
cd harbor
cp harbor.yml.tmpl harbor.yml
    5 hostname: harbor.boge.com
    17   certificate: /data/harbor/ssl/tls.cert
    18   private_key: /data/harbor/ssl/tls.key
    34 harbor_admin_password: boge666

## 创建harbor访问域名证书
mkdir /data/harbor/ssl && cd /data/harbor/ssl
openssl genrsa -out tls.key 2048
openssl req -new -x509 -key tls.key -out tls.cert -days 360 -subj /CN=*.boge.com

## 准备好单机编排工具`docker-compose`
> 从二进制安装k8s项目的bin目录拷贝过来
scp /etc/kubeasz/bin/docker-compose 10.0.1.204:/usr/bin/

> 也可以在docker官方进行下载
https://docs.docker.com/compose/install/

## 开始安装
./install.sh

## 推送镜像到harbor
echo '10.0.1.204 harbor.boge.com' >> /etc/hosts
docker tag nginx:latest  harbor.boge.com/library/nginx:latest
docker push harbor.boge.com/library/nginx:1.18.0-alpine

## 在其他节点上面拉取harbor镜像
> 在集群每个 node 节点进行如下配置
> ssh to 10.0.1.201(centos7)

mkdir -p /etc/docker/certs.d/harbor.boge.com
scp 10.0.1.204:/data/harbor/ssl/tls.cert /etc/docker/certs.d/harbor.boge.com/ca.crt
docker pull harbor.boge.com/library/nginx:latest


## 重启harbor
docker-compose down -v
docker-compose up -d
docker ps|grep harbor

## 附(引用自 https://github.com/easzlab/kubeasz):
containerd配置信任harbor证书
在集群每个 node 节点进行如下配置(假设ca.pem为自建harbor的CA证书)

ubuntu 1604:
cp ca.pem /usr/share/ca-certificates/harbor-ca.crt
echo harbor-ca.crt >> /etc/ca-certificates.conf
update-ca-certificates

CentOS 7:
cp ca.pem /etc/pki/ca-trust/source/anchors/harbor-ca.crt
update-ca-trust
上述配置完成后,重启 containerd 即可 systemctl restart containerd

第14关k8s架构师课程之业务Prometheus监控实战一

原创2021-03-28 20:30·博哥爱运维

服务监控

大家好,我是博哥爱运维。对于运维开发人员来说,不管是哪个平台服务,监控都是非常关键重要的。

在传统服务里面,我们通常会到zabbix、open-falcon、netdata来做服务的监控,但对于目前主流的K8s平台来说,由于服务pod会被调度到任何机器上运行,且pod挂掉后会被自动重启,并且我们也需要有更好的自动服务发现功能来实现服务报警的自动接入,实现更高效的运维报警,这里我们需要用到K8s的监控实现Prometheus,它是基于Google内部监控系统的开源实现。

Prometheus架构图

第14关k8s架构师课程之业务Prometheus监控实战一

Prometheus是由golang语言编写,这样它的部署实际上是比较简单的,就一个服务的二进制包加上对应的配置文件即可运行,然而这种方式的部署过程繁琐并且效率低下,我们这里就不以这种传统的形式来部署Prometheus来实现K8s集群的监控了,而是会用到Prometheus-Operator来进行Prometheus监控服务的安装,这也是我们生产中常用的安装方式。

从本质上来讲Prometheus属于是典型的有状态应用,而其有包含了一些自身特有的运维管理和配置管理方式。而这些都无法通过Kubernetes原生提供的应用管理概念实现自动化。为了简化这类应用程序的管理复杂度,CoreOS率先引入了Operator的概念,并且首先推出了针对在Kubernetes下运行和管理Etcd的Etcd Operator。并随后推出了Prometheus Operator。

Prometheus Operator的工作原理

从概念上来讲Operator就是针对管理特定应用程序的,在Kubernetes基本的Resource和Controller的概念上,以扩展Kubernetes api的形式。帮助用户创建,配置和管理复杂的有状态应用程序。从而实现特定应用程序的常见操作以及运维自动化。

在Kubernetes中我们使用Deployment、DamenSet,StatefulSet来管理应用Workload,使用Service,Ingress来管理应用的访问方式,使用ConfigMap和Secret来管理应用配置。我们在集群中对这些资源的创建,更新,删除的动作都会被转换为事件(Event),Kubernetes的Controller Manager负责监听这些事件并触发相应的任务来满足用户的期望。这种方式我们成为声明式,用户只需要关心应用程序的最终状态,其它的都通过Kubernetes来帮助我们完成,通过这种方式可以大大简化应用的配置管理复杂度。

而除了这些原生的Resource资源以外,Kubernetes还允许用户添加自己的自定义资源(Custom Resource)。并且通过实现自定义Controller来实现对Kubernetes的扩展。

如下所示,是Prometheus Operator的架构示意图:

第14关k8s架构师课程之业务Prometheus监控实战一

Prometheus的本质就是一组用户自定义的CRD资源以及Controller的实现,Prometheus Operator负责监听这些自定义资源的变化,并且根据这些资源的定义自动化地完成如Prometheus Server自身以及配置的自动化管理工作。

Prometheus Operator能做什么

要了解Prometheus Operator能做什么,其实就是要了解Prometheus Operator为我们提供了哪些自定义的Kubernetes资源,列出了Prometheus Operator目前提供的️4类资源:

  • Prometheus:声明式创建和管理Prometheus Server实例;
  • ServiceMonitor:负责声明式的管理监控配置;
  • PrometheusRule:负责声明式的管理告警配置;
  • Alertmanager:声明式的创建和管理Alertmanager实例。

简言之,Prometheus Operator能够帮助用户自动化的创建以及管理Prometheus Server以及其相应的配置。

实战操作篇一

在K8s集群中部署Prometheus Operator

我们这里用prometheus-operator来安装整套prometheus服务,建议直接用master分支即可,这也是官方所推荐的

https://github.com/prometheus-operator/kube-prometheus

开始安装

安装包和离线镜像包下载

https://cloud.189.cn/t/bM7f2aANnMVb (访问码:0nsj)

1. 解压下载的代码包
unzip kube-prometheus-master.zip
rm -f kube-prometheus-master.zip && cd kube-prometheus-master

2. 这里建议先看下有哪些镜像,便于在下载镜像快的节点上先收集好所有需要的离线docker镜像
# find ./ -type f |xargs grep 'image: '|sort|uniq|awk '{print $3}'|grep ^[a-zA-Z]|grep -Evw 'error|kubeRbacProxy'|sort -rn|uniq
quay.io/prometheus/prometheus:v2.22.1
quay.io/prometheus-operator/prometheus-operator:v0.43.2
quay.io/prometheus/node-exporter:v1.0.1
quay.io/prometheus/alertmanager:v0.21.0
quay.io/fabxc/prometheus_demo_service
quay.io/coreos/kube-state-metrics:v1.9.7
quay.io/brancz/kube-rbac-proxy:v0.8.0
grafana/grafana:7.3.4
gcr.io/google_containers/metrics-server-amd64:v0.2.01
directxman12/k8s-prometheus-adapter:v0.8.2

在测试的几个node上把这些离线镜像包都导入 docker load -i xxx.tar

3. 开始创建所有服务
kubectl create -f manifests/setup
kubectl create -f manifests/
过一会查看创建结果:
kubectl -n monitoring get all

# 附:清空上面部署的prometheus所有服务:
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup

访问下prometheus的UI

# 修改下prometheus UI的service模式,便于我们访问
# kubectl -n monitoring patch svc prometheus-k8s -p '{"spec":{"type":"NodePort"}}'
service/prometheus-k8s patched

# kubectl -n monitoring get svc prometheus-k8s 
NAME             TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
prometheus-k8s   NodePort   10.68.23.79   <none>        9090:22129/TCP   7m43s

点击上方菜单栏Status — Targets ,我们发现kube-controller-manager和kube-scheduler未发现

monitoring/kube-controller-manager/0 (0/0 up) 
monitoring/kube-scheduler/0 (0/0 up) 

接下来我们解决下这一个碰到的问题吧

注:如果发现下面不是监控的127.0.0.1,并且通过下面地址可以获取metric指标输出,那么这个改IP这一步可以不用操作

curl 10.0.1.201:10251/metrics curl 10.0.1.201:10252/metrics

# 这里我们发现这两服务监听的IP是127.0.0.1
# ss -tlnp|egrep 'controller|schedule'
LISTEN     0      32768  127.0.0.1:10251                    *:*                   users:(("kube-scheduler",pid=567,fd=5))
LISTEN     0      32768  127.0.0.1:10252                    *:*                   users:(("kube-controller",pid=583,fd=5))

问题定位到了,接下来先把两个组件的监听地址改为0.0.0.0

# 如果大家前面是按我设计的4台NODE节点,其中2台作master的话,那就在这2台master上把systemcd配置改一下
# 我这里第一台master  10.0.1.201
# sed -ri 's+127.0.0.1+0.0.0.0+g' /etc/systemd/system/kube-controller-manager.service 
# sed -ri 's+127.0.0.1+0.0.0.0+g' /etc/systemd/system/kube-scheduler.service
# systemctl daemon-reload
# systemctl restart kube-controller-manager.service
# systemctl restart kube-scheduler.service 

# 我这里第二台master  10.0.1.202
# sed -ri 's+127.0.0.1+0.0.0.0+g' /etc/systemd/system/kube-controller-manager.service 
# sed -ri 's+127.0.0.1+0.0.0.0+g' /etc/systemd/system/kube-scheduler.service
# systemctl daemon-reload
# systemctl restart kube-controller-manager.service
# systemctl restart kube-scheduler.service 

# 获取下metrics指标看看
curl 10.0.1.201:10251/metrics
curl 10.0.1.201:10252/metrics

然后因为K8s的这两上核心组件我们是以二进制形式部署的,为了能让K8s上的prometheus能发现,我们还需要来创建相应的service和endpoints来将其关联起来

注意:我们需要将endpoints里面的NODE IP换成我们实际情况的

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP

---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    k8s-app: kube-controller-manager
  name: kube-controller-manager
  namespace: kube-system
subsets:
- addresses:
  - ip: 10.0.1.201
  - ip: 10.0.1.202
  ports:
  - name: http-metrics
    port: 10252
    protocol: TCP

---

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP

---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: kube-system
subsets:
- addresses:
  - ip: 10.0.1.201
  - ip: 10.0.1.202
  ports:
  - name: http-metrics
    port: 10251
    protocol: TCP

将上面的yaml配置保存为repair-prometheus.yaml,然后创建它

kubectl apply -f repair-prometheus.yaml

创建完成后确认下

# kubectl -n kube-system get svc |egrep 'controller|scheduler'
kube-controller-manager   ClusterIP   None            <none>        10252/TCP                      58s
kube-scheduler            ClusterIP   None            <none>        10251/TCP                      58s

记得还要修改一个地方

# kubectl -n monitoring edit servicemonitors.monitoring.coreos.com kube-scheduler 
# 将下面两个地方的https换成http
    port: https-metrics
    scheme: https

# kubectl -n monitoring edit servicemonitors.monitoring.coreos.com kube-controller-manager
# 将下面两个地方的https换成http
    port: https-metrics
    scheme: https

然后再返回prometheus UI处,耐心等待几分钟,就能看到已经被发现了

monitoring/kube-controller-manager/0 (2/2 up) 
monitoring/kube-scheduler/0 (2/2 up) 

第14关k8s架构师课程之业务Prometheus监控实战二

原创2021-03-29 23:59·博哥爱运维

使用prometheus来监控ingress-nginx

大家好,我是博哥爱运维。我们前面部署过ingress-nginx,这个是整个K8s上所有服务的流量入口组件很关键,因此把它的metrics指标收集到prometheus来做好相关监控至关重要,因为前面ingress-nginx服务是以daemonset形式部署的,并且映射了自己的端口到宿主机上,那么我可以直接用pod运行NODE上的IP来看下metrics

curl 10.0.1.201:10254/metrics

创建 servicemonitor配置让prometheus能发现ingress-nginx的metrics

# vim servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: ingress-nginx
  name: nginx-ingress-scraping
  namespace: ingress-nginx
spec:
  endpoints:
  - interval: 30s
    path: /metrics
    port: metrics
  jobLabel: app
  namespaceSelector:
    matchNames:
    - ingress-nginx
  selector:
    matchLabels:
      app: ingress-nginx

创建它

# kubectl apply -f servicemonitor.yaml 
servicemonitor.monitoring.coreos.com/nginx-ingress-scraping created
# kubectl -n ingress-nginx get servicemonitors.monitoring.coreos.com 
NAME                     AGE
nginx-ingress-scraping   8s

指标一直没收集上来,看看proemtheus服务的日志,发现报错如下:

# kubectl -n monitoring logs prometheus-k8s-0 -c prometheus |grep error

level=error ts=2020-12-13T09:52:35.565Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:426: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"ingress-nginx\""

需要修改prometheus的clusterrole

#   kubectl edit clusterrole prometheus-k8s
#------ 原始的rules -------
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
#---------------------------

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
  

再到prometheus UI上看下,发现已经有了

ingress-nginx/nginx-ingress-scraping/0 (1/1 up) 

使用Prometheus来监控二进制部署的ETCD集群

作为K8s所有资源存储的关键服务ETCD,我们也有必要把它给监控起来,正好借这个机会,完整的演示一次利用Prometheus来监控非K8s集群服务的步骤

在前面部署K8s集群的时候,我们是用二进制的方式部署的ETCD集群,并且利用自签证书来配置访问ETCD,正如前面所说,现在关键的服务基本都会留有指标metrics接口支持prometheus的监控,利用下面命令,我们可以看到ETCD都暴露出了哪些监控指标出来

# curl --cacert /etc/kubernetes/ssl/ca.pem --cert /etc/etcd/ssl/etcd.pem  --key /etc/etcd/ssl/etcd-key.pem https://10.0.1.201:2379/metrics

上面查看没问题后,接下来我们开始进行配置使ETCD能被prometheus发现并监控

# 首先把ETCD的证书创建为secret
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/etcd/ssl/etcd.pem   --from-file=/etc/etcd/ssl/etcd-key.pem   --from-file=/etc/kubernetes/ssl/ca.pem

# 接着在prometheus里面引用这个secrets
kubectl -n monitoring edit prometheus k8s 

spec:
...
  secrets:
  - etcd-certs

# 保存退出后,prometheus会自动重启服务pod以加载这个secret配置,过一会,我们进pod来查看下是不是已经加载到ETCD的证书了
# kubectl -n monitoring exec -it prometheus-k8s-0 -c prometheus  -- sh 
/prometheus $ ls /etc/prometheus/secrets/etcd-certs/
ca.pem        etcd-key.pem  etcd.pem

接下来准备创建service、endpoints以及ServiceMonitor的yaml配置

注意替换下面的NODE节点IP为实际ETCD所在NODE内网IP

# vim prometheus-etcd.yaml 
apiVersion: v1
kind: Service
metadata:
  name: etcd-k8s
  namespace: monitoring
  labels:
    k8s-app: etcd
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: api
    port: 2379
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: monitoring
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 10.0.1.201
  - ip: 10.0.1.202
  - ip: 10.0.1.203
  ports:
  - name: api
    port: 2379
    protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd-k8s
  namespace: monitoring
  labels:
    k8s-app: etcd-k8s
spec:
  jobLabel: k8s-app
  endpoints:
  - port: api
    interval: 30s
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.pem
      certFile: /etc/prometheus/secrets/etcd-certs/etcd.pem
      keyFile: /etc/prometheus/secrets/etcd-certs/etcd-key.pem
      #use insecureSkipVerify only if you cannot use a Subject Alternative Name
      insecureSkipVerify: true 
  selector:
    matchLabels:
      k8s-app: etcd
  namespaceSelector:
    matchNames:
    - monitoring

开始创建上面的资源

# kubectl apply -f prometheus-etcd.yaml 
service/etcd-k8s created
endpoints/etcd-k8s created
servicemonitor.monitoring.coreos.com/etcd-k8s created

过一会,就可以在prometheus UI上面看到ETCD集群被监控了

monitoring/etcd-k8s/0 (3/3 up) 

接下来我们用grafana来展示被监控的ETCD指标

1. 在grafana官网模板中心搜索etcd,下载这个json格式的模板文件
https://grafana.com/dashboards/3070

2.然后打开自己先部署的grafana首页,
点击左边菜单栏四个小正方形方块HOME --- Manage
再点击右边 Import dashboard --- 
点击Upload .json File 按钮,上传上面下载好的json文件 etcd_rev3.json,
然后在prometheus选择数据来源
点击Import,即可显示etcd集群的图形监控信息

第14关k8s架构师课程之业务Prometheus监控实战三

原创2021-03-31 21:11·博哥爱运维

prometheus监控数据以及grafana配置持久化存储配置

大家好,我是博哥爱运维。这节实战课给大家讲解下如果配置prometheus以及grafana的数据持久化。

prometheus数据持久化配置

# 注意这下面的statefulset服务就是我们需要做数据持久化的地方
# kubectl -n monitoring get statefulset,pod|grep prometheus-k8s
statefulset.apps/prometheus-k8s      2/2     5h41m
pod/prometheus-k8s-0                       2/2     Running   1          19m
pod/prometheus-k8s-1                       2/2     Running   1          19m

# 看下我们之前准备的StorageClass动态存储
# kubectl get sc
NAME       PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
nfs-boge   nfs-provisioner-01   Retain          Immediate           false                  4d

# 准备prometheus持久化的pvc配置
# kubectl -n monitoring edit prometheus k8s

spec:
......
  storage:
    volumeClaimTemplate:
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: "nfs-boge"
        selector:
          matchLabels:
            app: my-example-prometheus
        resources:
          requests:
            storage: 1Gi

# 上面修改保存退出后,过一会我们查看下pvc创建情况,以及pod内的数据挂载情况
# kubectl -n monitoring get pvc
NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-055e6b11-31b7-4503-ba2b-4f292ba7bd06   1Gi        RWO            nfs-boge       17s
prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-249c344b-3ef8-4a5d-8003-b8ce8e282d32   1Gi        RWO            nfs-boge       17s


# kubectl -n monitoring exec -it prometheus-k8s-0 -c prometheus -- sh
/prometheus $ df -Th
......
10.0.1.201:/nfs_dir/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-055e6b11-31b7-4503-ba2b-4f292ba7bd06/prometheus-db
                     nfs4           97.7G      9.4G     88.2G  10% /prometheus

grafana配置持久化存储配置

# 保存pvc为grafana-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: grafana
  namespace: monitoring
spec:
  storageClassName: nfs-boge
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

# 开始创建pvc
# kubectl apply -f grafana-pvc.yaml 

# 看下创建的pvc
# kubectl -n monitoring get pvc
NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
grafana                              Bound    pvc-394a26e1-3274-4458-906e-e601a3cde50d   1Gi        RWX            nfs-boge       3s
prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-055e6b11-31b7-4503-ba2b-4f292ba7bd06   1Gi        RWO            nfs-boge       6m46s
prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-249c344b-3ef8-4a5d-8003-b8ce8e282d32   1Gi        RWO            nfs-boge       6m46s


# 编辑grafana的deployment资源配置
# kubectl -n monitoring edit deployments.apps grafana 

# 旧的配置
      volumes:
      - emptyDir: {}
        name: grafana-storage
# 替换成新的配置
      volumes:
      - name: grafana-storage
        persistentVolumeClaim:
          claimName: grafana

# 同时加入下面的env环境变量,将登陆密码进行固定修改
    spec:
      containers:
      ......
        env:
        - name: GF_SECURITY_ADMIN_USER
          value: admin
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin321

# 过一会,等grafana重启完成后,用上面的新密码进行登陆
# kubectl -n monitoring get pod -w|grep grafana
grafana-5698bf94f4-prbr2               0/1     Running   0          3s
grafana-5698bf94f4-prbr2               1/1     Running   0          4s

# 因为先前的数据并未持久化,所以会发现先导入的ETCD模板已消失,这时重新再导入一次,后面重启也不会丢了

注意,博哥所有的K8S文字笔记文章都对应同名称的视频教程,大家可以对照文字笔记进行观看视频。

第14关k8s架构师课程之业务Prometheus监控实战四

原创2021-04-01 20:35·博哥爱运维

prometheus发送报警

大家好,我是博哥爱运维。早期我们经常用邮箱接收报警邮件,但是报警不及时,而且目前各云平台对邮件发送限制还比较严格,所以目前在生产中用得更为多的是基于webhook来转发报警内容到企业中用的聊天工具中,比如钉钉、企业微信、飞书等。

prometheus的报警组件是Alertmanager,它支持自定义webhook的方式来接受它发出的报警,它发出的日志json字段比较多,我们需要根据需要接收的app来做相应的日志清洗转发

这里博哥将用golang结合Gin网络框架来编写一个日志清洗转发工具,分别对这几种常用的报警方式作详细地说明及实战

下载boge-webhook.zip

https://cloud.189.cn/t/B3EFZvnuMvuu (访问码:h1wx)

首先看下报警规则及报警发送配置是什么样的

prometheus-operator的规则非常齐全,基本属于开箱即用类型,大家可以根据日常收到的报警,对里面的rules报警规则作针对性的调整,比如把报警观察时长缩短一点等

监控报警规划修改   vim ./manifests/prometheus/prometheus-rules.yaml
修改完成记得更新   kubectl apply -f ./manifests/prometheus/prometheus-rules.yaml
# 通过这里可以获取需要创建的报警配置secret名称
# kubectl -n monitoring edit statefulsets.apps alertmanager-main
...
      volumes:
      - name: config-volume
        secret:
          defaultMode: 420
          secretName: alertmanager-main
...

# 注意事先在配置文件 alertmanager.yaml 里面编辑好收件人等信息 ,再执行下面的命令

kubectl create secret generic  alertmanager-main --from-file=alertmanager.yaml -n monitoring

报警配置文件 alertmanager.yaml

# global块配置下的配置选项在本配置文件内的所有配置项下可见
global:
  # 在Alertmanager内管理的每一条告警均有两种状态: "resolved"或者"firing". 在altermanager首次发送告警通知后, 该告警会一直处于firing状态,设置resolve_timeout可以指定处于firing状态的告警间隔多长时间会被设置为resolved状态, 在设置为resolved状态的告警后,altermanager不会再发送firing的告警通知.
#  resolve_timeout: 1h
  resolve_timeout: 10m

  # 告警通知模板
templates:
- '/etc/altermanager/config/*.tmpl'

# route: 根路由,该模块用于该根路由下的节点及子路由routes的定义. 子树节点如果不对相关配置进行配置,则默认会从父路由树继承该配置选项。每一条告警都要进入route,即要求配置选项group_by的值能够匹配到每一条告警的至少一个labelkey(即通过POST请求向altermanager服务接口所发送告警的labels项所携带的<labelname>),告警进入到route后,将会根据子路由routes节点中的配置项match_re或者match来确定能进入该子路由节点的告警(由在match_re或者match下配置的labelkey: labelvalue是否为告警labels的子集决定,是的话则会进入该子路由节点,否则不能接收进入该子路由节点).
route:
  # 例如所有labelkey:labelvalue含cluster=A及altertname=LatencyHigh labelkey的告警都会被归入单一组中
  group_by: ['job', 'altername', 'cluster', 'service','severity']
  # 若一组新的告警产生,则会等group_wait后再发送通知,该功能主要用于当告警在很短时间内接连产生时,在group_wait内合并为单一的告警后再发送
#  group_wait: 30s
  group_wait: 10s
  # 再次告警时间间隔
#  group_interval: 5m
  group_interval: 20s
  # 如果一条告警通知已成功发送,且在间隔repeat_interval后,该告警仍然未被设置为resolved,则会再次发送该告警通知
#  repeat_interval: 12h
  repeat_interval: 1m
  # 默认告警通知接收者,凡未被匹配进入各子路由节点的告警均被发送到此接收者
  receiver: 'webhook'
  # 上述route的配置会被传递给子路由节点,子路由节点进行重新配置才会被覆盖

  # 子路由树
  routes:
  # 该配置选项使用正则表达式来匹配告警的labels,以确定能否进入该子路由树
  # match_re和match均用于匹配labelkey为service,labelvalue分别为指定值的告警,被匹配到的告警会将通知发送到对应的receiver
  - match_re:
      service: ^(foo1|foo2|baz)$
    receiver: 'webhook'
    # 在带有service标签的告警同时有severity标签时,他可以有自己的子路由,同时具有severity != critical的告警则被发送给接收者team-ops-wechat,对severity == critical的告警则被发送到对应的接收者即team-ops-pager
    routes:
    - match:
        severity: critical
      receiver: 'webhook'
  # 比如关于数据库服务的告警,如果子路由没有匹配到相应的owner标签,则都默认由team-DB-pager接收
  - match:
      service: database
    receiver: 'webhook'
  # 我们也可以先根据标签service:database将数据库服务告警过滤出来,然后进一步将所有同时带labelkey为database
  - match:
      severity: critical
    receiver: 'webhook'
# 抑制规则,当出现critical告警时 忽略warning
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  # Apply inhibition if the alertname is the same.
  #   equal: ['alertname', 'cluster', 'service']
  #
# 收件人配置
receivers:
- name: 'webhook'
  webhook_configs:
  - url: 'http://alertmanaer-dingtalk-svc.kube-system//b01bdc063/boge/getjson'
    send_resolved: true

附: 监控其他服务的prometheus规则配置

https://github.com/samber/awesome-prometheus-alerts

第15关 k8s架构师课程基于gitlab的CICD自动化二

原创2021-04-06 20:17·博哥爱运维

大家好,我是博哥爱运维。这节课我们先来部署gitlab私有代码仓库所需要的数据库postgresql和redis。

需要注意的是,如果大家的nfs-server的地址和挂载目录不是按博哥前面课程讲得来定义的话,那么下面的yaml配置中需要记得替换。

部署postgresql

# ------------------------------------------------
#  mkdir -p /nfs_dir/{gitlab_etc_ver130806,gitlab_log_ver130806,gitlab_opt_ver130806,gitlab_postgresql_data_ver130806}
#  kubectl create namespace gitlab-ver130806
#  kubectl -n gitlab-ver130806 apply -f 3postgres.yaml
#  kubectl -n gitlab-ver130806 apply -f 4redis.yaml
#  kubectl -n gitlab-ver130806 apply -f 5gitlab.yaml
#  kubectl -n gitlab-ver130806 apply -f 6gitlab-tls.yaml
# ------------------------------------------------



# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gitlab-postgresql-data-ver130806
  labels:
    type: gitlab-postgresql-data-ver130806
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  nfs:
    path: /nfs_dir/gitlab_postgresql_data_ver130806
    server: 10.0.1.201

# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: gitlab-postgresql-data-ver130806-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: nfs
  selector:
    matchLabels:
      type: gitlab-postgresql-data-ver130806
---
apiVersion: v1
kind: Service
metadata:
  name: postgresql
  labels:
    app: gitlab
    tier: postgreSQL
spec:
  ports:
    - port: 5432
  selector:
    app: gitlab
    tier: postgreSQL

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgresql
  labels:
    app: gitlab
    tier: postgreSQL
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gitlab
      tier: postgreSQL
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: gitlab
        tier: postgreSQL
    spec:
      #nodeSelector:
      #  gee/disk: "500g"
      containers:
        - image: postgres:12.6-alpine
        #- image: harbor.boge.com/library/postgres:12.6-alpine
          name: postgresql
          env:
            - name: POSTGRES_USER
              value: gitlab
            - name: POSTGRES_DB
              value: gitlabhq_production
            - name: POSTGRES_PASSWORD
              value: bogeusepg
            - name: TZ
              value: Asia/Shanghai
          ports:
            - containerPort: 5432
              name: postgresql
          livenessProbe:
            exec:
              command:
              - sh
              - -c
              - exec pg_isready -U gitlab -h 127.0.0.1 -p 5432 -d gitlabhq_production
            initialDelaySeconds: 110
            timeoutSeconds: 5
            failureThreshold: 6
          readinessProbe:
            exec:
              command:
              - sh
              - -c
              - exec pg_isready -U gitlab -h 127.0.0.1 -p 5432 -d gitlabhq_production
            initialDelaySeconds: 20
            timeoutSeconds: 3
            periodSeconds: 5
#          resources:
#            requests:
#              cpu: 100m
#              memory: 512Mi
#            limits:
#              cpu: "1"
#              memory: 1Gi
          volumeMounts:
            - name: postgresql
              mountPath: /var/lib/postgresql/data
      volumes:
        - name: postgresql
          persistentVolumeClaim:
            claimName: gitlab-postgresql-data-ver130806-pvc

部署redis

---
apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    app: gitlab
    tier: backend
spec:
  ports:
    - port: 6379
      targetPort: 6379
  selector:
    app: gitlab
    tier: backend
---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    app: gitlab
    tier: backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gitlab
      tier: backend
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: gitlab
        tier: backend
    spec:
      #nodeSelector:
      #  gee/disk: "500g"
      containers:
        - image: redis:6.2.0-alpine3.13
        #- image: harbor.boge.com/library/redis:6.2.0-alpine3.13
          name: redis
          command:
            - "redis-server"
          args:
            - "--requirepass"
            - "bogeuseredis"
#          resources:
#            requests:
#              cpu: "1"
#              memory: 2Gi
#            limits:
#              cpu: "1"
#              memory: 2Gi
          ports:
            - containerPort: 6379
              name: redis
          livenessProbe:
            exec:
              command:
              - sh
              - -c
              - "redis-cli ping"
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            exec:
              command:
              - sh
              - -c
              - "redis-cli ping"
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 1
            successThreshold: 1
            failureThreshold: 3
      initContainers:
      - command:
        - /bin/sh
        - -c
        - |
          ulimit -n 65536
          mount -o remount rw /sys
          echo never > /sys/kernel/mm/transparent_hugepage/enabled
          mount -o remount rw /proc/sys
          echo 2000 > /proc/sys/net/core/somaxconn
          echo 1 > /proc/sys/vm/overcommit_memory
        image: registry.cn-beijing.aliyuncs.com/acs/busybox:v1.29.2
        imagePullPolicy: IfNotPresent
        name: init-redis
        resources: {}
        securityContext:
          privileged: true
          procMount: Default

第15关 k8s架构师课程基于gitlab的CICD自动化三

原创2021-04-07 21:36·博哥爱运维

大家好,我是博哥爱运维。这节课我们来开始部署gitlab服务。

部署gitlab

先定制一下镜像

Dockerfile

FROM gitlab/gitlab-ce:13.8.6-ce.0

RUN rm /etc/apt/sources.list \
    && echo 'deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main' > /etc/apt/sources.list.d/pgdg.list \
    && wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
COPY sources.list /etc/apt/sources.list

RUN apt-get update -yq && \
    apt-get install -y vim iproute2 net-tools iputils-ping curl wget software-properties-common unzip postgresql-client-12 && \
    rm -rf /var/cache/apt/archives/*

RUN ln -svf /usr/bin/pg_dump /opt/gitlab/embedded/bin/pg_dump

#---------------------------------------------------------------
# docker build -t gitlab/gitlab-ce:13.8.6-ce.1 .

sources.list

deb http://mirrors.aliyun.com/ubuntu/ xenial main
deb-src http://mirrors.aliyun.com/ubuntu/ xenial main
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates main
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main
deb http://mirrors.aliyun.com/ubuntu/ xenial universe
deb-src http://mirrors.aliyun.com/ubuntu/ xenial universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb http://mirrors.aliyun.com/ubuntu xenial-security main
deb-src http://mirrors.aliyun.com/ubuntu xenial-security main
deb http://mirrors.aliyun.com/ubuntu xenial-security universe
deb-src http://mirrors.aliyun.com/ubuntu xenial-security universe

开始部署

# restore gitlab data command example:
#   kubectl -n gitlab-ver130806 exec -it $(kubectl -n gitlab-ver130806 get pod|grep -v runner|grep gitlab|awk '{print $1}') -- gitlab-rake gitlab:backup:restore BACKUP=1602889879_2020_10_17_12.9.2
#   kubectl -n gitlab-ver130806 exec -it $(kubectl -n gitlab-ver130806 get pod|grep -v runner|grep gitlab|awk '{print $1}') -- gitlab-ctl reconfigure
#   kubectl -n gitlab-ver130806 exec -it $(kubectl -n gitlab-ver130806 get pod|grep -v runner|grep gitlab|awk '{print $1}') -- gitlab-ctl status

# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gitlab-etc-ver130806
  labels:
    type: gitlab-etc-ver130806
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  nfs:
    path: /nfs_dir/gitlab_etc_ver130806
    server: 10.0.1.201

# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: gitlab-etc-ver130806-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: nfs
  selector:
    matchLabels:
      type: gitlab-etc-ver130806
# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gitlab-log-ver130806
  labels:
    type: gitlab-log-ver130806
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  nfs:
    path: /nfs_dir/gitlab_log_ver130806
    server: 10.0.1.201

# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: gitlab-log-ver130806-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: nfs
  selector:
    matchLabels:
      type: gitlab-log-ver130806
      
# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gitlab-opt-ver130806
  labels:
    type: gitlab-opt-ver130806
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  nfs:
    path: /nfs_dir/gitlab_opt_ver130806
    server: 10.0.1.201

# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: gitlab-opt-ver130806-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: nfs
  selector:
    matchLabels:
      type: gitlab-opt-ver130806
---
apiVersion: v1
kind: Service
metadata:
  name: gitlab
  labels:
    app: gitlab
    tier: frontend
spec:
  ports:
    - name: gitlab-ui
      port: 80
      protocol: TCP
      targetPort: 80
    - name: gitlab-ssh
      port: 22
      protocol: TCP
      targetPort: 22
  selector:
    app: gitlab
    tier: frontend
  type: NodePort
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: gitlab
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gitlab-cb-ver130806
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: gitlab
    namespace: gitlab-ver130806
---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gitlab
  labels:
    app: gitlab
    tier: frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gitlab
      tier: frontend
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: gitlab
        tier: frontend
    spec:
      serviceAccountName: gitlab
      containers:
        - image: harbor.boge.com/library/gitlab-ce:13.8.6-ce.1
          name: gitlab
#          resources:
#            requests:
#              cpu: 400m
#              memory: 4Gi
#            limits:
#              cpu: "800m"
#              memory: 8Gi
          securityContext:
            privileged: true
          env:
            - name: TZ
              value: Asia/Shanghai
            - name: GITLAB_OMNIBUS_CONFIG
              value: |
                postgresql['enable'] = false
                gitlab_rails['db_username'] = "gitlab"
                gitlab_rails['db_password'] = "bogeusepg"
                gitlab_rails['db_host'] = "postgresql"
                gitlab_rails['db_port'] = "5432"
                gitlab_rails['db_database'] = "gitlabhq_production"
                gitlab_rails['db_adapter'] = 'postgresql'
                gitlab_rails['db_encoding'] = 'utf8'
                redis['enable'] = false
                gitlab_rails['redis_host'] = 'redis'
                gitlab_rails['redis_port'] = '6379'
                gitlab_rails['redis_password'] = 'bogeuseredis'
                gitlab_rails['gitlab_shell_ssh_port'] = 22
                external_url 'http://git.boge.com/'
                nginx['listen_port'] = 80
                nginx['listen_https'] = false
                #-------------------------------------------
                gitlab_rails['gitlab_email_enabled'] = true
                gitlab_rails['gitlab_email_from'] = '[email protected]'
                gitlab_rails['gitlab_email_display_name'] = 'boge'
                gitlab_rails['gitlab_email_reply_to'] = '[email protected]'
                gitlab_rails['gitlab_default_can_create_group'] = true
                gitlab_rails['gitlab_username_changing_enabled'] = true
                gitlab_rails['smtp_enable'] = true
                gitlab_rails['smtp_address'] = "smtp.exmail.qq.com"
                gitlab_rails['smtp_port'] = 465
                gitlab_rails['smtp_user_name'] = "[email protected]"
                gitlab_rails['smtp_password'] = "bogesendmail"
                gitlab_rails['smtp_domain'] = "exmail.qq.com"
                gitlab_rails['smtp_authentication'] = "login"
                gitlab_rails['smtp_enable_starttls_auto'] = true
                gitlab_rails['smtp_tls'] = true
                #-------------------------------------------
                # 关闭 promethues
                prometheus['enable'] = false
                # 关闭 grafana
                grafana['enable'] = false
                # 减少内存占用
                unicorn['worker_memory_limit_min'] = "200 * 1 << 20"
                unicorn['worker_memory_limit_max'] = "300 * 1 << 20"
                # 减少 sidekiq 的并发数
                sidekiq['concurrency'] = 16
                # 减少 postgresql 数据库缓存
                postgresql['shared_buffers'] = "256MB"
                # 减少 postgresql 数据库并发数量
                postgresql['max_connections'] = 8
                # 减少进程数   worker=CPU核数+1
                unicorn['worker_processes'] = 2
                nginx['worker_processes'] = 2
                puma['worker_processes'] = 2
                # puma['per_worker_max_memory_mb'] = 850
                # 保留3天备份的数据文件
                gitlab_rails['backup_keep_time'] = 259200
                #-------------------------------------------
          ports:
            - containerPort: 80
              name: gitlab
          livenessProbe:
            exec:
              command:
              - sh
              - -c
              - "curl -s http://127.0.0.1/-/health|grep -w 'GitLab OK'"
            initialDelaySeconds: 120
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          readinessProbe:
            exec:
              command:
              - sh
              - -c
              - "curl -s http://127.0.0.1/-/health|grep -w 'GitLab OK'"
            initialDelaySeconds: 120
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          volumeMounts:
            - mountPath: /etc/gitlab
              name: gitlab1
            - mountPath: /var/log/gitlab
              name: gitlab2
            - mountPath: /var/opt/gitlab
              name: gitlab3
            - mountPath: /etc/localtime
              name: tz-config

      volumes:
        - name: gitlab1
          persistentVolumeClaim:
            claimName: gitlab-etc-ver130806-pvc
        - name: gitlab2
          persistentVolumeClaim:
            claimName: gitlab-log-ver130806-pvc
        - name: gitlab3
          persistentVolumeClaim:
            claimName: gitlab-opt-ver130806-pvc
        - name: tz-config
          hostPath:
            path: /usr/share/zoneinfo/Asia/Shanghai

      securityContext:
        runAsUser: 0
        fsGroup: 0

部署gitlab-tls

# old version

#apiVersion: extensions/v1beta1
#kind: Ingress
#metadata:
#  name: gitlab
#  annotations:
#    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
#    nginx.ingress.kubernetes.io/proxy-body-size: "20m"
#spec:
#  tls:
#  - hosts:
#    - git.boge.com
#    secretName: mytls
#  rules:
#  - host: git.boge.com
#    http:
#      paths:
#      - path: /
#        backend:
#          serviceName: gitlab
#          servicePort: 80

# Add tls
# openssl genrsa -out tls.key 2048
# openssl req -new -x509 -key tls.key -out tls.cert -days 360 -subj /CN=*.boge.com
# kubectl -n gitlab-ver130806 create secret tls mytls --cert=tls.cert --key=tls.key 

# new version

## https://kubernetes.io/docs/concepts/services-networking/ingress/
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: gitlab
  annotations:
    nginx.ingress.kubernetes.io/force-ssl-redirect: "false"
    nginx.ingress.kubernetes.io/proxy-body-size: "20m"
spec:
  tls:
  - hosts:
    - git.boge.com
    secretName: mytls
  rules:
  - host: git.boge.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: gitlab
            port:
              number: 80

---

大家可以对照博哥帐号下面的同名视频课程一起学习,效果更好。

第15关k8s架构师课程之基于gitlab的CICD自动化四

原创2021-04-08 21:20·博哥爱运维

大家好,我是博哥爱运维。这节课我们来讲gitlab里面的runner,gitlab的CI/CD自动化,都是由gitlab下发指令,依靠runner这个组件去执行的,我们这里也是把runner运行在k8s上面。

runner按字面意思就是奔跑者的意思,它在整个自动化流程里面的角色也相当于一个外卖小哥,它接收gitlab下发的自动化指令,来去做相应的操作,从而实现整个CI/CD的效果。

部署gitlab-runner

docker

#  mkdir -p /nfs_dir/{gitlab-runner1-ver130806-docker,gitlab-runner2-ver130806-share}

# gitlab-ci-multi-runner register

#                   Active  √ Paused Runners don't accept new jobs
#                Protected     This runner will only run on pipelines triggered on protected branches
#        Run untagged jobs     Indicates whether this runner can pick jobs without tags
# Lock to current projects     When a runner is locked, it cannot be assigned to other projects

# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gitlab-runner1-ver130806-docker
  labels:
    type: gitlab-runner1-ver130806-docker
spec:
  capacity:
    storage: 0.1Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  nfs:
    path: /nfs_dir/gitlab-runner1-ver130806-docker
    server: 10.0.1.201

# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: gitlab-runner1-ver130806-docker
  namespace: gitlab-ver130806
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 0.1Gi
  storageClassName: nfs
  selector:
    matchLabels:
      type: gitlab-runner1-ver130806-docker


---
# https://docs.gitlab.com/runner/executors

#concurrent = 30
#check_interval = 0

#[session_server]
#  session_timeout = 1800

#[[runners]]
#  name = "gitlab-runner1-ver130806-docker"
#  url = "http://git.boge.com"
#  token = "xxxxxxxxxxxxxxxxxxxxxx"
#  executor = "kubernetes"
#  [runners.kubernetes]
#    namespace = "gitlab-ver130806"
#    image = "docker:stable"
#    helper_image = "gitlab/gitlab-runner-helper:x86_64-9fc34d48-pwsh"
#    privileged = true
#    [[runners.kubernetes.volumes.pvc]]
#      name = "gitlab-runner1-ver130806-docker"
#      mount_path = "/mnt"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gitlab-runner1-ver130806-docker
  namespace: gitlab-ver130806
spec:
  replicas: 1
  selector:
    matchLabels:
      name: gitlab-runner1-ver130806-docker
  template:
    metadata:
      labels:
        name: gitlab-runner1-ver130806-docker
    spec:
      hostAliases:
      - ip: "10.68.140.109"
        hostnames:
        - "git.boge.com"
      serviceAccountName: gitlab
      containers:
      - args:
        - run
        image: gitlab/gitlab-runner:v13.10.0
        name: gitlab-runner1-ver130806-docker
        volumeMounts:
        - mountPath: /etc/gitlab-runner
          name: config
        - mountPath: /etc/ssl/certs
          name: cacerts
          readOnly: true
      restartPolicy: Always
      volumes:
      - persistentVolumeClaim:
          claimName: gitlab-runner1-ver130806-docker
        name: config
      - hostPath:
          path: /usr/share/ca-certificates/mozilla
        name: cacerts

share

# gitlab-ci-multi-runner register

#                   Active  √ Paused Runners don't accept new jobs
#                Protected     This runner will only run on pipelines triggered on protected branches
#        Run untagged jobs  √ Indicates whether this runner can pick jobs without tags
# Lock to current projects     When a runner is locked, it cannot be assigned to other projects

# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gitlab-runner2-ver130806-share
  labels:
    type: gitlab-runner2-ver130806-share
spec:
  capacity:
    storage: 0.1Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  nfs:
    path: /nfs_dir/gitlab-runner2-ver130806-share
    server: 10.0.1.201

# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: gitlab-runner2-ver130806-share
  namespace: gitlab-ver130806
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 0.1Gi
  storageClassName: nfs
  selector:
    matchLabels:
      type: gitlab-runner2-ver130806-share


---
# https://docs.gitlab.com/runner/executors

#concurrent = 30
#check_interval = 0

#[session_server]
#  session_timeout = 1800

#[[runners]]
#  name = "gitlab-runner2-ver130806-share"
#  url = "http://git.boge.com"
#  token = "xxxxxxxxxxxxxxxx"
#  executor = "kubernetes"
#  [runners.kubernetes]
#    namespace = "gitlab-ver130806"
#    image = "registry.cn-beijing.aliyuncs.com/acs/busybox/busybox:v1.29.2"
#    helper_image = "gitlab/gitlab-runner-helper:x86_64-9fc34d48-pwsh"
#    privileged = false
#    [[runners.kubernetes.volumes.pvc]]
#      name = "gitlab-runner2-v1230-share"
#      mount_path = "/mnt"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gitlab-runner2-ver130806-share
  namespace: gitlab-ver130806
spec:
  replicas: 1
  selector:
    matchLabels:
      name: gitlab-runner2-ver130806-share
  template:
    metadata:
      labels:
        name: gitlab-runner2-ver130806-share
    spec:
      hostAliases:
      - ip: "10.68.140.109"
        hostnames:
        - "git.boge.com"
      serviceAccountName: gitlab
      containers:
      - args:
        - run
        image: gitlab/gitlab-runner:v13.10.0
        name: gitlab-runner2-ver130806-share
        volumeMounts:
        - mountPath: /etc/gitlab-runner
          name: config
        - mountPath: /etc/ssl/certs
          name: cacerts
          readOnly: true
      restartPolicy: Always
      volumes:
      - persistentVolumeClaim:
          claimName: gitlab-runner2-ver130806-share
        name: config
      - hostPath:
          path: /usr/share/ca-certificates/mozilla
        name: cacerts

大家请参照博哥帐号下面的同名视频课程对照学习操作,来保证学习效果。

第15关k8s架构师课程之基于gitlab的CICD自动化五

原创2021-04-09 22:47·博哥爱运维

大家好,我是博哥爱运维。这节课我们继续来配置gitlab相关的服务。

增加gitlab在k8s的内部解析

为什么这么做呢,博哥这里总结了两点原因:

  1. 优化gitlab网络通信,对于runner要调用gitlab服务来说,直接走内部地址速度更快
  2. 如果是在用阿里云的同学,采用在k8s上部署gitlab的话,那么k8s内部服务比如runner是不能通过同集群前面的公网入口SLB来请求访问的,这里阿里云自身网络架构原因,这个时候我们只需要做如下配置即可完美解决
# kubectl -n kube-system get configmaps coredns  -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        ready
        log
        rewrite stop {
          name regex git.boge.com gitlab.gitlab-ver130806.svc.cluster.local
          answer name gitlab.gitlab-ver130806.svc.cluster.local git.boge.com
        }

        kubernetes cluster.local in-addr.arpa ip6.arpa {

          pods verified
          fallthrough in-addr.arpa ip6.arpa
        }
        autopath @kubernetes
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system

增加ssh端口转发

我们要保持所有开发人员能使用默认的22端口来通过ssh拉取代码,那么就需要做如下端口转发配置

# 注意配置此转发前,需要将对应NODE的本身ssh连接端口作一下修改,以防后面登陆不了该机器
iptables -t nat -A PREROUTING -d 10.0.1.204 -p tcp --dport 22 -j DNAT --to-destination 10.0.1.204:31755

#↑ 删除上面创建的这一条规则,将-A换成-D即可

iptables -t nat  -nvL PREROUTING

接着我们找一台机器,这里我们选取10.0.1.201这台机器,加一条本地hosts 10.0.1.204 git.boge.com,来试下推送gitlab代码仓库有无问题,详细操作见本节同名视频课程,希望大家能对着视频自己动手操作一遍,理解上面这些配置的含义,后面可以举一反三,在k8s的其他服务也可以这么来做,达到访问更优的效果。

第15关k8s架构师课程之基于gitlab的CICD自动化六

原创2021-04-11 18:52·博哥爱运维

部署dind(docker in docker)

大家好,我是博哥爱运维。我们现在在k8s来部署dind服务,提供整个CI(持续集成)的功能。

我们看看docker version列出的结果 Docker采取的是C/S架构 Docker进程默认不监听任何端口,它会生成一个socket(/var/run/docker.sock)文件来进行本地进程通信 Docker C/S 之间采取Rest API作为通信协议,我们可以让Docker daemon进程监听一个端口,这就为我们用docker client调用远程调用docker daemon进程执行镜像构建提供了可行性

第15关k8s架构师课程之基于gitlab的CICD自动化六

docker in docker

# dind pip instll staus : kill -9  code 137(128+9) ,may be limits(cpu,memory) resources need change

# only have docker client ,use dind can be use normal
#dindSvc=$(kubectl -n kube-system get svc dind |awk 'NR==2{print $3}')
#export DOCKER_HOST="tcp://${dindSvc}:2375/"
#export DOCKER_DRIVER=overlay2
#export DOCKER_TLS_CERTDIR=""


---
# SVC
kind: Service
apiVersion: v1
metadata:
  name: dind
  namespace: kube-system
spec:
  selector:
    app: dind
  ports:
    - name: tcp-port
      port: 2375
      protocol: TCP
      targetPort: 2375

---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dind
  namespace: kube-system
  labels:
    app: dind
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dind
  template:
    metadata:
      labels:
        app: dind
    spec:
      hostNetwork: true
      containers:
      - name: dind
        #image: docker:19-dind
        image: harbor.boge.com/library/docker:19-dind
        lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "docker login harbor.boge.com -u 'admin' -p 'boge666'"]
           # 3. when delete this pod , use this keep kube-proxy to flush role done
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]
        ports:
        - containerPort: 2375
#        resources:
#          requests:
#            cpu: 200m
#            memory: 256Mi
#          limits:
#            cpu: 0.5
#            memory: 1Gi
        readinessProbe:
          tcpSocket:
            port: 2375
          initialDelaySeconds: 10
          periodSeconds: 30
        livenessProbe:
          tcpSocket:
            port: 2375
          initialDelaySeconds: 10
          periodSeconds: 30
        securityContext: 
            privileged: true
        env: 
          - name: DOCKER_HOST 
            value: tcp://localhost:2375
          - name: DOCKER_DRIVER 
            value: overlay2
          - name: DOCKER_TLS_CERTDIR 
            value: ''
        volumeMounts: 
          - name: docker-graph-storage
            mountPath: /var/lib/docker
          - name: tz-config
            mountPath: /etc/localtime
           # kubectl -n kube-system create secret generic harbor-ca --from-file=harbor-ca=/data/harbor/ssl/tls.cert
          - name: harbor-ca
            mountPath: /etc/docker/certs.d/harbor.boge.com/ca.crt
            subPath: harbor-ca
       # kubectl create secret docker-registry boge-secret --docker-server=harbor.boge.com --docker-username=admin --docker-password=boge666 [email protected]
      hostAliases:
      - hostnames:
        - harbor.boge.com
        ip: 10.0.1.204
      imagePullSecrets:
      - name: bogeharbor
      volumes:
#      - emptyDir:
#          medium: ""
#          sizeLimit: 10Gi
      - hostPath:
          path: /var/lib/container/docker
        name: docker-graph-storage
      - hostPath:
          path: /usr/share/zoneinfo/Asia/Shanghai
        name: tz-config
      - name: harbor-ca
        secret:
          secretName: harbor-ca
          defaultMode: 0600
#
#        kubectl taint node 10.0.1.201 Ingress=:NoExecute
#        kubectl describe node 10.0.1.201 |grep -i taint
#        kubectl taint node 10.0.1.201 Ingress:NoExecute-
      nodeSelector:
        kubernetes.io/hostname: "10.0.1.201"
      tolerations:
      - operator: Exists

第15关 k8s架构师课程之CICD自动化devops大结局

原创2021-04-12 20:12·博哥爱运维

CI/CD生产实战项目

大家好,我是博哥爱运维。这节课我们开始最终CI/CD自动化流程实战,终于要到打大BOSS大结局了,博哥自从2021年3月1日开始分享这套K8S架构师课程以来,坚持每天整理文档录制视频,一直坚持到今天,在这期间,博哥认识了不少喜欢K8S的朋友,也收到了很多朋友的鼓励和建议,这对博哥都是宝贵的财富。有些人可能会想,在现如今这个社会,免费的东西还存在嘛?免费的东西就是最贵的东西,诚然,这些博哥也认同,但也不能排除网上也有很多热爱技术,执着分享的人,像国内外很多大牛开源出来很多优化的代码项目,像优秀的操作系统LINUX,像谷歌开源的这套K8S系统等等,博哥虽然做不到这么优秀,但也想把自己工作中的一些踩坑经验积累分享给大家,要说私心嘛,就是博哥想锻炼下自己的讲课经验,拓宽下自己的职业发展路线,但这个和我分享给大家的内容不相冲突,反而我认为它们是有利的,相辅相成的,博哥分享的所有东西都是实实在在工作中拿下来的生产经验,再精心整理来作分享。

大家一定要仔细观看,多多操作,把整个流程都掌握透彻。这里我会采用目前企业较常见的编程语言python的flask模块来实施完整的项目自动化流程步骤,其他语言都可以参照这个项目来实施自动化流程。

先把k8s的二进制命令行工具kubectl容器化备用

FROM harbor.boge.com/library/alpine:3.13

MAINTAINER boge

ENV TZ "Asia/Shanghai"

RUN sed -ri 's+dl-cdn.alpinelinux.org+mirrors.aliyun.com+g' /etc/apk/repositories \
 && apk add --no-cache curl tzdata ca-certificates \
 && cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
 && apk upgrade \
 && rm -rf /var/cache/apk/*

COPY kubectl /usr/local/bin/
RUN chmod +x /usr/local/bin/kubectl

ENTRYPOINT ["kubectl"]
CMD ["help"]

python的flask模块

准备好flask相关的代码文件上传到gitlab代码仓库

app.py

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, boge! 21.04.11.01'

@app.route('/gg/<username>')
def hello(username):
    return 'welcome' + ': ' + username + '!'

Dockerfile

FROM harbor.boge.com/library/python:3.5-slim-stretch
MAINTAINER boge

WORKDIR /kae/app

COPY requirements.txt .

RUN  sed -i 's/deb.debian.org/ftp.cn.debian.org/g' /etc/apt/sources.list \
  && sed -i 's/security.debian.org/ftp.cn.debian.org/g' /etc/apt/sources.list \
  && apt-get update -y \
  && apt-get install -y wget gcc libsm6 libxext6 libglib2.0-0 libxrender1 make \
  && apt-get clean && apt-get autoremove -y && rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple -r requirements.txt \
    && rm requirements.txt

COPY . .

EXPOSE 5000
HEALTHCHECK CMD curl --fail http://localhost:5000 || exit 1

ENTRYPOINT ["gunicorn", "app:app", "-c", "gunicorn_config.py"]

gunicorn_config.py

bind = '0.0.0.0:5000'
graceful_timeout = 3600
timeout = 1200
max_requests = 1200
workers = 1
worker_class = 'gevent'

requirements.txt

flask
gevent
gunicorn

在代码仓库变量配置里面配置如下变量值

Type           Key                      Value                    State        Masked
Variable   DOCKER_USER                 admin                   下面都关闭   下面都关闭
Variable   DOCKER_PASS                 boge666
Variable   REGISTRY_URL                harbor.boge.com
Variable   REGISTRY_NS                 product
File       KUBE_CONFIG_TEST            k8s相关config配置文件内容

准备项目自动化配置文件.gitlab-ci.yml

stages:
  - build
  - deploy
  - rollback

# tag name need: 20.11.21.01
variables:
  namecb: "flask-test"
  svcport: "5000"
  replicanum: "2"
  ingress: "flask-test.boge.com"
  certname: "mytls"
  CanarylIngressNum: "20"

.deploy_k8s: &deploy_k8s |
  if [ $CANARY_CB -eq 1 ];then cp -arf .project-name-canary.yaml ${namecb}-${CI_COMMIT_TAG}.yaml; sed -ri "s+CanarylIngressNum+${CanarylIngressNum}+g" ${namecb}-${CI_COMMIT_TAG}.yaml; sed -ri "s+NomalIngressNum+$(expr 100 - ${CanarylIngressNum})+g" ${namecb}-${CI_COMMIT_TAG}.yaml ;else cp -arf .project-name.yaml ${namecb}-${CI_COMMIT_TAG}.yaml;fi
  sed -ri "s+projectnamecb.boge.com+${ingress}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
  sed -ri "s+projectnamecb+${namecb}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
  sed -ri "s+5000+${svcport}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
  sed -ri "s+replicanum+${replicanum}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
  sed -ri "s+mytls+${certname}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
  sed -ri "s+mytagcb+${CI_COMMIT_TAG}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
  sed -ri "s+harbor.boge.com/library+${IMG_URL}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
  cat ${namecb}-${CI_COMMIT_TAG}.yaml
  [ -d ~/.kube ] || mkdir ~/.kube
  echo "$KUBE_CONFIG" > ~/.kube/config
  if [ $NORMAL_CB -eq 1 ];then if kubectl get deployments.|grep -w ${namecb}-canary &>/dev/null;then kubectl delete deployments.,svc ${namecb}-canary ;fi;fi
  kubectl apply -f ${namecb}-${CI_COMMIT_TAG}.yaml --record
  echo
  echo
  echo "============================================================="
  echo "                    Rollback Indx List"
  echo "============================================================="
  kubectl rollout history deployment ${namecb}|tail -5|awk -F"[ =]+" '{print $1"\t"$5}'|sed '$d'|sed '$d'|sort -r|awk '{print $NF}'|awk '$0=""NR".   "$0'

.rollback_k8s: &rollback_k8s |
  [ -d ~/.kube ] || mkdir ~/.kube
  echo "$KUBE_CONFIG" > ~/.kube/config
  last_version_command=$( kubectl rollout history deployment ${namecb}|tail -5|awk -F"[ =]+" '{print $1"\t"$5}'|sed '$d'|sed '$d'|tail -${ROLL_NUM}|head -1 )
  last_version_num=$( echo ${last_version_command}|awk '{print $1}' )
  last_version_name=$( echo ${last_version_command}|awk '{print $2}' )
  kubectl rollout undo deployment ${namecb} --to-revision=$last_version_num
  echo $last_version_num
  echo $last_version_name
  kubectl rollout history deployment ${namecb}


build:
  stage: build
  retry: 2
  variables:
    # use dind.yaml to depoy dind'service on k8s
    DOCKER_HOST: tcp://10.68.86.33:2375/
    DOCKER_DRIVER: overlay2
    DOCKER_TLS_CERTDIR: ""
  ##services:
    ##- docker:dind
  before_script:
    - docker login ${REGISTRY_URL} -u "$DOCKER_USER" -p "$DOCKER_PASS"
  script:
    - docker pull ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:latest || true
    - docker build --network host --cache-from ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:latest --tag ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:$CI_COMMIT_TAG --tag ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:latest .
    - docker push ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:$CI_COMMIT_TAG
    - docker push ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:latest
  after_script:
    - docker logout ${REGISTRY_URL}
  tags:
    - "docker"
  only:
    - tags





#--------------------------K8S DEPLOY--------------------------------------------------

BOGE-deploy:
  stage: deploy
  image: harbor.boge.com/library/kubectl:v1.19.9
  variables:
    KUBE_CONFIG: "$KUBE_CONFIG_TEST"
    IMG_URL: "${REGISTRY_URL}/${REGISTRY_NS}"
    NORMAL_CB: 1
  script:
    - *deploy_k8s
  when: manual
  only:
    - tags

# canary start
BOGE-canary-deploy:
  stage: deploy
  image: harbor.boge.com/library/kubectl:v1.19.9
  variables:
    KUBE_CONFIG: "$KUBE_CONFIG_TEST"
    IMG_URL: "${REGISTRY_URL}/${REGISTRY_NS}"
    CANARY_CB: 1
  script:
    - *deploy_k8s
  when: manual
  only:
    - tags
# canary end

BOGE-rollback-1:
  stage: rollback
  image: harbor.boge.com/library/kubectl:v1.19.9
  variables:
    KUBE_CONFIG: "$KUBE_CONFIG_TEST"
    ROLL_NUM: 1
  script:
    - *rollback_k8s
  when: manual
  only:
    - tags


BOGE-rollback-2:
  stage: rollback
  image: harbor.boge.com/library/kubectl:v1.19.9
  variables:
    KUBE_CONFIG: "$KUBE_CONFIG_TEST"
    ROLL_NUM: 2
  script:
    - *rollback_k8s
  when: manual
  only:
    - tags


BOGE-rollback-3:
  stage: rollback
  image: harbor.boge.com/library/kubectl:v1.19.9
  variables:
    KUBE_CONFIG: "$KUBE_CONFIG_TEST"
    ROLL_NUM: 3
  script:
    - *rollback_k8s
  when: manual
  only:
    - tags

准备k8s的deployment模板文件 .project-name.yaml

这里要注意提前在K8S把harbor拉取的凭证secret给创建好,命令如下:

kubectl -n test create secret docker-registry boge-secret --docker-server=harbor.boge.com --docker-username=admin --docker-password=boge666 [email protected]

---
# SVC
kind: Service
apiVersion: v1
metadata:
  labels:
    kae: "true"
    kae-app-name: projectnamecb
    kae-type: app
  name: projectnamecb
spec:
  selector:
    kae: "true"
    kae-app-name: projectnamecb
    kae-type: app
  ports:
    - name: http-port
      port: 80
      protocol: TCP
      targetPort: 5000
#      nodePort: 12345
#  type: NodePort

---
# Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  labels:
    kae: "true"
    kae-app-name: projectnamecb
    kae-type: app
  name: projectnamecb
spec:
  tls:
  - hosts:
    - projectnamecb.boge.com
    secretName: mytls
  rules:
  - host: projectnamecb.boge.com
    http:
      paths:
      - path: /
        backend:
          serviceName: projectnamecb
          servicePort: 80

---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: projectnamecb
  labels:
    kae: "true"
    kae-app-name: projectnamecb
    kae-type: app
spec:
  replicas: replicanum
  selector:
    matchLabels:
      kae-app-name: projectnamecb
  template:
    metadata:
      labels:
        kae: "true"
        kae-app-name: projectnamecb
        kae-type: app
    spec:
      containers:
      - name: projectnamecb
        image: harbor.boge.com/library/projectnamecb:mytagcb
        env:
          - name: TZ
            value: Asia/Shanghai
        ports:
        - containerPort: 5000
        readinessProbe:
          httpGet:
            scheme: HTTP
            path: /
            port: 5000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        livenessProbe:
          httpGet:
            scheme: HTTP
            path: /
            port: 5000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        resources:
          requests:
            cpu: 0.3
            memory: 0.5Gi
          limits:
            cpu: 0.3
            memory: 0.5Gi
      imagePullSecrets:
      - name: boge-secret

准备好K8S上金丝雀部署的模板文件 .project-name-canary.yaml

---
# SVC
kind: Service
apiVersion: v1
metadata:
  labels:
    kae: "true"
    kae-app-name: projectnamecb-canary
    kae-type: app
  name: projectnamecb-canary
spec:
  selector:
    kae: "true"
    kae-app-name: projectnamecb-canary
    kae-type: app
  ports:
    - name: http-port
      port: 80
      protocol: TCP
      targetPort: 5000
#      nodePort: 12345
#  type: NodePort

---
# Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  labels:
    kae: "true"
    kae-app-name: projectnamecb-canary
    kae-type: app
  name: projectnamecb
  annotations:
    nginx.ingress.kubernetes.io/service-weight: |
        projectnamecb: NomalIngressNum, projectnamecb-canary: CanarylIngressNum
spec:
  tls:
  - hosts:
    - projectnamecb.boge.com
    secretName: mytls
  rules:
  - host: projectnamecb.boge.com
    http:
      paths:
      - path: /
        backend:
          serviceName: projectnamecb
          servicePort: 80
      - path: /
        backend:
          serviceName: projectnamecb-canary
          servicePort: 80

---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: projectnamecb-canary
  labels:
    kae: "true"
    kae-app-name: projectnamecb-canary
    kae-type: app
spec:
  replicas: replicanum
  selector:
    matchLabels:
      kae-app-name: projectnamecb-canary
  template:
    metadata:
      labels:
        kae: "true"
        kae-app-name: projectnamecb-canary
        kae-type: app
    spec:
      containers:
      - name: projectnamecb-canary
        image: harbor.boge.com/library/projectnamecb:mytagcb
        env:
          - name: TZ
            value: Asia/Shanghai
        ports:
        - containerPort: 5000
        readinessProbe:
          httpGet:
            scheme: HTTP
            path: /
            port: 5000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        livenessProbe:
          httpGet:
            scheme: HTTP
            path: /
            port: 5000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        resources:
          requests:
            cpu: 0.3
            memory: 0.5Gi
          limits:
            cpu: 0.3
            memory: 0.5Gi
      imagePullSecrets:
      - name: boge-secret

最后,在修改完代码,提交tag版本号后,即会触发CI/CD自动化流程,详细操作可以看博哥录制的同名视频教程好了。

好啦,到这里为止整个历时40多天的K8S架构师课程也就告一个段落了,希望本套课程能对的大家有所帮助,如果你觉得博哥视频教程不错的话,请分享给你的朋友,让更多人能掌握K8S,掌握CI/CD自动化。

快速生成kubernetes(k8s)的yaml配置的4种方法

原创2021-04-18 17:37·博哥爱运维

大家好,我是博哥爱运维。大家通过学习博哥前面分享的全套k8s架构师课程后(课程笔记和视频在博哥这个帐号下面的文章和视频部分可以看到),会发现在k8s上面来说,掌握yaml配置是非常关键的,但由于yaml配置像python语言一样,严格遵循缩进格式,这样对于大部分新入门学习k8s的同学不算太友好,经常会因为一些缩进没注意造成各种各样的报错,那么博哥在这里分享下我以前刚开始学习这个yaml配置时的一些方法,希望能帮助到大家快速掌握yaml配置。

快速生成k8s的yaml配置的4种方法

1、通过kubectl命令行快速生成一个deployment及service的yaml标准配置

# 这条命令是不是很眼熟,对了,这就是博哥前面的课程里面创建deployment的命令,我们在后面加上`--dry-run -o yaml`,--dry-run代表这条命令不会实际在K8s执行,-o yaml是会将试运行结果以yaml的格式打印出来,这样我们就能轻松获得yaml配置了

# kubectl create deployment nginx --image=nginx --dry-run -o yaml       
apiVersion: apps/v1     # <---  apiVersion 是当前配置格式的版本
kind: Deployment     #<--- kind 是要创建的资源类型,这里是 Deployment
metadata:        #<--- metadata 是该资源的元数据,name 是必需的元数据项
  creationTimestamp: null
  labels:
    app: nginx
  name: nginx
spec:        #<---    spec 部分是该 Deployment 的规格说明
  replicas: 1        #<---  replicas 指明副本数量,默认为 1
  selector:
    matchLabels:
      app: nginx
  strategy: {}
  template:        #<---   template 定义 Pod 的模板,这是配置文件的重要部分
    metadata:        #<---     metadata 定义 Pod 的元数据,至少要定义一个 label。label 的 key 和 value 可以任意指定
      creationTimestamp: null
      labels:
        app: nginx
    spec:           #<---  spec 描述 Pod 的规格,此部分定义 Pod 中每一个容器的属性,name 和 image 是必需的
      containers:
      - image: nginx
        name: nginx
        resources: {}
status: {}


# 基于上面的deployment服务生成service的yaml配置
# kubectl expose deployment nginx --port=80 --target-port=80 --dry-run -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    app: nginx
  name: nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
status:
  loadBalancer: {}

2、利用helm查看各种官方标准复杂的yaml配置以供参考

# 以查看rabbitmq集群安装的配置举例
# 首先添加chart仓库
helm repo add aliyun-apphub https://apphub.aliyuncs.com
helm repo update

# 这里我们在后面加上 --dry-run --debug 就是模拟安装并且打印输出所有的yaml配置
helm install -n rq rabbitmq-ha aliyun-apphub/rabbitmq-ha --dry-run --debug 

3、将docker-compose转成k8s的yaml格式配置

# 下载二进制包
# https://github.com/kubernetes/kompose/releases

# 开始转发yaml配置
./kompose-linux-amd64 -f docker-compose.yml convert

4、docker命令输出转换成对应的yaml文件示例

这里以 Prometheus Node Exporter 为例演示如何运行自己的 DaemonSet。
Prometheus 是流行的系统监控方案,Node Exporter 是 Prometheus 的 agent,以 Daemon 的形式运行在每个被监控节点上。
如果是直接在 Docker 中运行 Node Exporter 容器,命令为:

docker run -d \
    -v "/proc:/host/proc" \
    -v "/sys:/host/sys" \  
    -v "/:/rootfs" \
    --net=host \  
    prom/node-exporter \  
    --path.procfs /host/proc \  
    --path.sysfs /host/sys \  
    --collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"



将其转换为 DaemonSet 的 YAML 配置文件 node_exporter.yml:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter-daemonset
spec:
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      hostNetwork: true      # <<<<<<<< 1 直接使用 Host 的网络
      containers:
      - name: node-exporter
        image: prom/node-exporter
        imagePullPolicy: IfNotPresent
        command:             # <<<<<<<< 2 设置容器启动命令
        - /bin/node_exporter
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - ^/(sys|proc|dev|host|etc)($|/)
        volumeMounts:        # <<<<<<<< 3 通过Volume将Host路径/proc、/sys 和 / 映射到容器中
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: root
          mountPath: /rootfs
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
      - name: root
        hostPath:
          path: /

关于K8S服务健康检测方式补充说明

原创2021-03-09 21:37·博哥爱运维

Liveness 和 Readiness 的三种使用方式

          readinessProbe:  # 定义只有http检测容器6222端口请求返回是 200-400,则接收下面的Service web-svc 的请求
            httpGet:
              scheme: HTTP
              path: /check
              port: 6222
            initialDelaySeconds: 10   # 容器启动 10 秒之后开始探测,注意观察下g1的启动成功时间
            periodSeconds: 5          # 每隔 5 秒再探测一次
            timeoutSeconds: 5         # http检测请求的超时时间
            successThreshold: 1       # 检测到有1次成功则认为服务是`就绪`
            failureThreshold: 3       # 检测到有3次失败则认为服务是`未就绪`
          livenessProbe:  # 定义只有http检测容器6222端口请求返回是 200-400,否则就重启pod
            httpGet:
              scheme: HTTP
              path: /check
              port: 6222
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3

#-----------------------------------------
            
          readinessProbe:
            exec:
              command:
              - sh
              - -c
              - "redis-cli ping"
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 1
            successThreshold: 1
            failureThreshold: 3
          livenessProbe:
            exec:
              command:
              - sh
              - -c
              - "redis-cli ping"
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3

#-----------------------------------------

        readinessProbe:
          tcpSocket:
            port: 9092
          initialDelaySeconds: 15
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: 9092
          initialDelaySeconds: 15
          periodSeconds: 10

猜你喜欢

转载自blog.csdn.net/linjie_830914/article/details/124366363#comments_22776912