## Learning Sources —
Level 3 binary high availability installation k8s production-level cluster*
1. Install ansible-----centos on each node
yum install ansible -y
yum install -y git
2.node01 download and install script:
如果在这里面不好复杂的话,可以直接到我的github仓库里面下载这个脚本,地址:
git clone https://github.com/bogeit/LearnK8s.git
3. Enter the directory and execute the script format
sh k8s_install_new.sh Tb123456 172.24.75 149\ 155\ 152\ 154\ 150\ 151\ 148\ 153 docker calico test
4. If you encounter an error, don't be discouraged, and execute the following commands several times, because network timeouts are very common in github
./ezdown -D
5. After the download is complete, continue to execute
sh k8s_install_new.sh Tb123456 172.24.75 119\ 118\ 117\ 115\ 116\ 114 docker calico test
6. After the installation is complete, reload the environment variables to complete the kubectl command
# 安装完成后重新加载下环境变量以实现kubectl命令补齐
. ~/.bashrc
7. Check the status of all pods:
kubectl get pods -A
Docker, the most proud little brother of K8s in level 4
Dockerfile:
1.python--dockerfile格式
FROM python:3.7-slim-stretch
MAINTAINER boge <[email protected]>
WORKDIR /app
COPY requirements.txt .
RUN sed -i 's/deb.debian.org/ftp.cn.debian.org/g' /etc/apt/sources.list \
&& sed -i 's/security.debian.org/ftp.cn.debian.org/g' /etc/apt/sources.list \
&& apt-get update -y \
&& apt-get install -y wget gcc libsm6 libxext6 libglib2.0-0 libxrender1 git vim \
&& apt-get clean && apt-get autoremove -y && rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple -r requirements.txt \
&& rm requirements.txt
COPY . .
EXPOSE 5000
HEALTHCHECK CMD curl --fail http://localhost:5000 || exit 1
ENTRYPOINT ["gunicorn", "app:app", "-c", "gunicorn_config.py"]
2.golang--dockerfile格式
# stage 1: build src code to binary
FROM golang:1.13-alpine3.10 as builder
MAINTAINER boge <[email protected]>
ENV GOPROXY https://goproxy.cn
# ENV GO111MODULE on
COPY *.go /app/
RUN cd /app && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags "-s -w" -o hellogo .
# stage 2: use alpine as base image
FROM alpine:3.10
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories && \
apk update && \
apk --no-cache add tzdata ca-certificates && \
cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
# apk del tzdata && \
rm -rf /var/cache/apk/*
COPY --from=builder /app/hellogo /hellogo
CMD ["/hellogo"]
3.nodejs格式
FROM node:12.6.0-alpine
MAINTAINER boge <[email protected]>
WORKDIR /app
COPY package.json .
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories && \
apk update && \
yarn config set registry https://registry.npm.taobao.org && \
yarn install
RUN yarn build
COPY . .
EXPOSE 6868
ENTRYPOINT ["yarn", "start"]
4.java--dockerfile格式
FROM maven:3.6.3-adoptopenjdk-8 as target
ENV MAVEN_HOME /usr/share/maven
ENV PATH $MAVEN_HOME/bin:$PATH
COPY settings.xml /usr/share/maven/conf/
WORKDIR /build
COPY pom.xml .
RUN mvn dependency:go-offline # use docker cache
COPY src/ /build/src/
RUN mvn clean package -Dmaven.test.skip=true
FROM java:8
WORKDIR /app
RUN rm /etc/localtime && cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
COPY --from=target /build/target/*.jar /app/app.jar
EXPOSE 8080
ENTRYPOINT ["java","-Xmx768m","-Xms256m","-Djava.security.egd=file:/dev/./urandom","-jar","/app/app.jar"]
5.docker的整个生命周期展示
docker login "仓库地址" -u "仓库用户名" -p "仓库密码"
docker pull "仓库地址"/"仓库命名空间"/"镜像名称":latest || true
docker build --network host --build-arg PYPI_IP="xx.xx.xx.xx" --cache-from "仓库地址"/"仓库命名空间"/"镜像名称":latest --tag "仓库地址"/"仓库命名空间"/"镜像名称":"镜像版本号" --tag "仓库地址"/"仓库命名空间"/"镜像名称":latest .
docker push "仓库地址"/"仓库命名空间"/"镜像名称":"镜像版本号"
docker push "仓库地址"/"仓库命名空间"/"镜像名称":latest
# 基于redis的镜像运行一个docker实例
docker run --name myredis --net host -d redis:6.0.2-alpine3.11 redis-server --requirepass boGe666
6.dockerfile实战
我这里将上面的flask和golang项目上传到了网盘,地址如下
https://cloud.189.cn/t/M36fYrIrEbui (访问码:hy47)
# 解压
unzip docker-file.zip
# 先打包python项目的镜像并运行测试
cd python
docker build -t python/flask:v0.0.1 .
docker run -d -p 80:5000 python/flask:v0.0.1
# 再打包golang项目的镜像并运行测试
docker build -t boge/golang:v0.0.1 .
docker run -d -p80:3000 boge/golang:v0.0.1
One of the strategies for K8s to overcome the fifth level
Make good use of the -h command! ! !
In level 3, we have deployed a set of K8S clusters in binary form, now let’s meet K8S
1.生产中的小技巧:k8s删除namespaces状态一直为terminating问题处理、执行下面脚本
2.执行格式bash 1.sh kubevirt(namespace名字)
#!/bin/bash
set -eo pipefail
die() {
echo "$*" 1>&2 ; exit 1; }
need() {
which "$1" &>/dev/null || die "Binary '$1' is missing but required"
}
# checking pre-reqs
need "jq"
need "curl"
need "kubectl"
PROJECT="$1"
shift
test -n "$PROJECT" || die "Missing arguments: kill-ns <namespace>"
kubectl proxy &>/dev/null &
PROXY_PID=$!
killproxy () {
kill $PROXY_PID
}
trap killproxy EXIT
sleep 1 # give the proxy a second
kubectl get namespace "$PROJECT" -o json | jq 'del(.spec.finalizers[] | select("kubernetes"))' | curl -s -k -H "Content-Type: application/json" -X PUT -o /dev/null --data-binary @- http://localhost:8001/api/v1/namespaces/$PROJECT/finalize && echo "Killed namespace: $PROJECT"
Pod
1.擅用-h 帮助参数
kubectl run -h
kubectl get pod -w -o wide 查看pod详细信息
2.进入容器内部
kubectl -it exec nginx -- sh
3.查看pod详细信息、可以用来排错
kubectl describe pod nginx
4.查看可用镜像版本:生产小技巧
来源cat /opt/kube/bin/docker-tag
先source加载一下,docker+双按tab+tab
source ~/.bashrc
用法:docker-tag tags nginx
Level 5 K8s Conquest Strategy 2-Deployment
Deployment expansion and rollback
Make good use of -h
1.善用-h
kubectl create deployment -h
kubectl create deployment nginx --image=nginx
2.查看创建结果:
kubectl get deployments.apps
3.看下自动关联创建的副本集replicaset--- kubectl get rs
4.扩容pod的数量、善用-h
kubectl scale -h
kubectl scale deployment nginx --replicas=2
5.查看pod在哪里运行、pod状态等
kubectl get pod -o wide
6.善用--record、对版本回滚很重要!
kubectl set image deployment/nginx nginx=nginx:1.9.9 --record
# 升级nginx的版本
# kubectl set image deployments/nginx nginx=nginx:1.19.5 --record
7.查看历史版本、回归依据
# kubectl rollout history deployment nginx
deployment.apps/nginx
REVISION CHANGE-CAUSE
1 <none>
2 kubectl set image deployments/nginx nginx=nginx:1.9.9 --record=true
3 kubectl set image deployments/nginx nginx=nginx:1.19.5 --record=true
8.回滚到特定版本
kubectl rollout undo deployment nginx --to-revision=2
查看版本历史
kubectl rollout history deployment nginx
补充
1.禁止pod调度到该节点上
kubectl cordon
2.驱逐该节点上的所有pod
kubectl drain
生成yaml文件的方法:
kubectl create deployment nginx --image=nginx --dry-run -o yaml > nginx.yaml
Level 5 K8s Overcoming Battle Strategy 3 - Health Check of Service Pod
Hello everyone, I am Boge Ai Operation and Maintenance. The content of this lesson will explain to you how to perform health checks on our business services on K8S.
Liveness
#继续接受文件尝试重新创建---外部访问可能回404
#例子:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness
spec:
restartPolicy: OnFailure
containers:
- name: liveness
image: busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
livenessProbe:
#readinessProbe: # 这里将livenessProbe换成readinessProbe即可,其它配置都一样
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 10 # 容器启动 10 秒之后开始检测
periodSeconds: 5 # 每隔 5 秒再检测一次
Readiness
一般用这个探针比较多,**Readiness**将容器设置为不可用
配置和上面一样见上面注释:在containers:后面加存活探针readinessProbe:
**Readiness**的特点不在接受服务,不在导入流量到pod内
两者可以同时使用:可以单独使用,也可以同时使用
接下来我们详细分析下滚动更新的原理,为什么上面服务新版本创建的pod数量是7个,同时只销毁了3个旧版本的pod呢?
我们不显式配置这段的话,默认值均是25%
strategy:
rollingUpdate:
maxSurge: 35%
maxUnavailable: 35%
#按正常的生产处理流程,在获取足够的新版本错误信息提交给开发分析后,我们可以通过kubectl rollout undo 来回滚到上一个正常的服务版本:
#查看历史版本:
kubectl rollout history deployment mytest
#回滚到特定版本
kubectl rollout undo deployment mytest --to-revision=1
Level 5 K8s Architect Course Overcome Battle Strategy 4-Service
This lesson will explain to you how to use service to balance the traffic load of internal service pods on K8S.
The IP of the pod will change every restart, so we should not expect the pod of K8s to be robust
And the traffic entrance uses a service that can fix IP to act as an abstract internal load balancer to provide pod access
1.创建一个svc来暴露deploy---nginx端口,会生成svc
kubectl expose deployment nginx --port=80 --target-port=80 --dry-run=client -o yaml
2.查看svc
kubectl get svc
看下自动关联生成的endpoint
kubectl get endpoints nginx
3.测试下svc的负载均衡效果
分别进入几个容器内部修改nginx主页文件---如果没有多个nginx自己扩容下deploy善用-h指令
# kubectl exec -it nginx-6799fc88d8-2kgn8 -- bash
root@nginx-f89759699-bzwd2:/# echo nginx-6799fc88d8-2kgn8 > /usr/share/nginx/html/index.html
4.测试svc的负载均衡效果
查看SVC
kubectl get svc
访问svc发现每次返回都不一样,达到负载均衡
curl 10.68.121.237
5.提供外网访问,外部访问:
kubectl patch svc nginx -p '{"spec":{"type":"NodePort"}}'
查看NodePort端口
kubectl get svc
尝试访问访问
nginx NodePort 10.68.121.237 <none> 80:32012/TCP 12m app=nginx
查看看下kube-proxy的配置
cat /etc/systemd/system/kube-proxy.service
cat /var/lib/kube-proxy/kube-proxy-config.yaml
查看ipvs的虚拟网卡:ip addr |grep ipv
查看lvs都武器列表:ipvsadm -ln 可以看见ipvs的轮询策略
[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/opt/kube/bin/kube-proxy \
--bind-address=10.0.1.202 \
--cluster-cidr=172.20.0.0/16 \
--hostname-override=10.0.1.202 \
--kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig \
--logtostderr=true \
--proxy-mode=ipvs
还可以用K8s的DNS来访问、除了直接用cluster ip,以及上面说到的NodePort模式来访问Service,我们还可以用K8s的DNS来访问
我们前面装好的CoreDNS,来提供K8s集群的内部DNS访问
查看一下coreDNS
kubectl -n kube-system get deployment,pod|grep dns
serviceName.namespaceName
coredns是一个DNS服务器,每当有新的Service被创建的时候,coredns就会添加该Service的DNS记录,然后我们通过serviceName.namespaceName就可以来访问到对应的pod了,下面来演示下:
它可以从容器内部ping外部的svc
[root@node01 LearnK8s]# kubectl run -it --rm busybox --image=busybox -- sh
If you don't see a command prompt, try pressing enter.
/ # ping nginx.default
##default可以省略
PING nginx.default (10.68.121.237): 56 data bytes
64 bytes from 10.68.121.237: seq=0 ttl=64 time=0.041 ms
64 bytes from 10.68.121.237: seq=1 ttl=64 time=0.074 ms
Service production tips to access services on non-K8s through svc
创建service后,会自动创建对应的endpoint\\\也可以手动创建endpoint
在生产中,我们可以通过创建不带selector的Service,然后创建同样名称的endpoint,来关联K8s集群以外的服务
我们可以直接复用K8s上的ingress(这个后面会讲到,现在我们就当它是一个nginx代理),来访问K8s集群以外的服务,省去了自己搭建前面Nginx代理服务器的麻烦
开始实践测试
这里我们挑选node-2节点,用python运行一个简易web服务器
[root@node-2 mnt]# python -m SimpleHTTPServer 9999
Serving HTTP on 0.0.0.0 port 9999 ...
在node01上配置svc和endpoints
注意Service和Endpoints的名称必须一致
# 注意我这里把两个资源的yaml写在一个文件内,在实际生产中,我们经常会这么做,方便对一个服务的所有资源进行统一管理,不同资源之间用"---"来分隔
apiVersion: v1
kind: Service
metadata:
name: mysvc
namespace: default
spec:
type: ClusterIP
ports:
- port: 80
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: mysvc
namespace: default
subsets:
- addresses:
#启动测试服务的节点,我这里是node2
- ip: 172.24.75.118
nodeName: 172.24.75.118
ports:
- port: 9999
protocol: TCP
开始创建并测试:
# kubectl apply -f mysvc.yaml
service/mysvc created
endpoints/mysvc created
# kubectl get svc,endpoints |grep mysvc
service/mysvc ClusterIP 10.68.71.166 <none> 80/TCP 14s
endpoints/mysvc 10.0.1.202:9999 14s
# curl 10.68.71.166
mysvc
# 我们回到node-2节点上,可以看到有一条刚才的访问日志打印出来了
10.0.1.201 - - [25/Nov/2020 14:42:45] "GET / HTTP/1.1" 200 -
#也可以修改yaml文件外网访问加端口查看结果
查看nodeport端口范围
cat /etc/systemd/system/kube-apiserver.service|grep service
#--service-node-port-range=30000-32767 \
service生产调优:
kubectl patch svc nginx -p '{"spec":{"externalTrafficPolicy":"Local"}}'
service/nginx patched
现在通过非pod所在node节点的IP来访问是不通了
通过所在node的IP发起请求正常
可以看到日志显示的来源IP就是201,这才是我们想要的结果
# 去掉这个优化配置也很简单
# kubectl patch svc nginx -p '{"spec":{"externalTrafficPolicy":""}}'
The upper part of the traffic entrance Ingress of the 6th k8s architect course
Hello everyone, this lesson brings k8s traffic entrance ingres\This course brings you all based on actual combat experience in production, so the ingress-nginx we use here is not an ordinary community version, but has passed the test of super-large production traffic , the largest cloud platform in China, Alibaba Cloud, is branched out based on the community version, and has been modified to be more suitable for production. It is basically ready to use out of the box. The following is the introduction of aliyun-ingress-controller:
The following introduction only intercepts the latest part, and more document resources can be found in the official file:
https://developer.aliyun.com/article/598075
If configuration dynamic update is not supported, frequent reloading of Nginx will bring obvious request access problems in scenarios with high frequency changes:
- Cause certain QPS jitter and access failure
- For long-term connection services, they will be frequently disconnected
- Cause a large number of Nginx Worker processes that are shutting down, causing memory expansion
- For detailed principle analysis, see this article: https://developer.aliyun.com/article/692732
The yaml configuration used in production:
aliyun-ingress-nginx.yaml
kubectl apply -f aliyun-ingress-nginx.yaml
#配置见下一个文件
我们查看下pod,会发现空空如也,为什么会这样呢?
kubectl -n ingress-nginx get pod
kind: DaemonSet
注意yaml配置里面DaemonSet的配置,我使用了节点选择配置,只有打了我指定lable标签的node节点,也会被允许调度pod上去运行、需要的标签是 boge/ingress-controller-ready=true
打标签的方法: 在哪个节点打标签、就会在哪个节点运行ds(Daemonset)、
Daemonset的特点是在每个Node上只能运行一个:
kubectl label node 172.24.75.119 boge/ingress-controller-ready=true
kubectl label node 172.24.75.118 boge/ingress-controller-ready=true
...你可以把每个node都打上标签、不想用标签你可以注释掉自己的nodeSelctor
打完标签、接着可以看到pod就被调到这两台node上启动了
尝试注释掉aliyun-ingress-nginx.yaml的节点选择、会发现每个节点就会自动生成一个ds、
#nodeSelector:
# boge/ingress-controller-ready: "true"
查看ds
kubectl get ds -n ingress-nginx
如果是自建机房,我们通常会在至少2台node节点上运行有ingress-nginx的pod\\\
也就是说会生成多个nginx-ingress-controller
不妨注释掉nodeSelector:
让每个节点自动生成nginx-ingress-controller
kubectl get ds -n ingress-nginx
nginx-ingress-controller 6
#我们准备来部署aliyun-ingress-controller,下面直接是生产中在用的yaml配置,我们保存了aliyun-ingress-nginx.yaml准备开始部署:详细讲解下面yaml配置的每个部分
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
labels:
app: ingress-nginx
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx
labels:
app: ingress-nginx
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: nginx-ingress-controller
labels:
app: ingress-nginx
rules:
- apiGroups:
- ""
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
- namespaces
- services
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
- "networking.k8s.io"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- "extensions"
- "networking.k8s.io"
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- apiGroups:
- ""
resources:
- configmaps
resourceNames:
- "ingress-controller-leader-nginx"
verbs:
- get
- update
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: nginx-ingress-controller
labels:
app: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: nginx-ingress-controller
subjects:
- kind: ServiceAccount
name: nginx-ingress-controller
namespace: ingress-nginx
---
apiVersion: v1
kind: Service
metadata:
labels:
app: ingress-nginx
name: nginx-ingress-lb
namespace: ingress-nginx
spec:
# DaemonSet need:
# ----------------
type: ClusterIP
# ----------------
# Deployment need:
# ----------------
# type: NodePort
# ----------------
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
- name: https
port: 443
targetPort: 443
protocol: TCP
- name: metrics
port: 10254
protocol: TCP
targetPort: 10254
selector:
app: ingress-nginx
---
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-configuration
namespace: ingress-nginx
labels:
app: ingress-nginx
data:
keep-alive: "75"
keep-alive-requests: "100"
upstream-keepalive-connections: "10000"
upstream-keepalive-requests: "100"
upstream-keepalive-timeout: "60"
allow-backend-server-header: "true"
enable-underscores-in-headers: "true"
generate-request-id: "true"
http-redirect-code: "301"
ignore-invalid-headers: "true"
log-format-upstream: '{"@timestamp": "$time_iso8601","remote_addr": "$remote_addr","x-forward-for": "$proxy_add_x_forwarded_for","request_id": "$req_id","remote_user": "$remote_user","bytes_sent": $bytes_sent,"request_time": $request_time,"status": $status,"vhost": "$host","request_proto": "$server_protocol","path": "$uri","request_query": "$args","request_length": $request_length,"duration": $request_time,"method": "$request_method","http_referrer": "$http_referer","http_user_agent": "$http_user_agent","upstream-sever":"$proxy_upstream_name","proxy_alternative_upstream_name":"$proxy_alternative_upstream_name","upstream_addr":"$upstream_addr","upstream_response_length":$upstream_response_length,"upstream_response_time":$upstream_response_time,"upstream_status":$upstream_status}'
max-worker-connections: "65536"
worker-processes: "2"
proxy-body-size: 20m
proxy-connect-timeout: "10"
proxy_next_upstream: error timeout http_502
reuse-port: "true"
server-tokens: "false"
ssl-ciphers: ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
ssl-protocols: TLSv1 TLSv1.1 TLSv1.2
ssl-redirect: "false"
worker-cpu-affinity: auto
---
kind: ConfigMap
apiVersion: v1
metadata:
name: tcp-services
namespace: ingress-nginx
labels:
app: ingress-nginx
---
kind: ConfigMap
apiVersion: v1
metadata:
name: udp-services
namespace: ingress-nginx
labels:
app: ingress-nginx
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx
labels:
app: ingress-nginx
annotations:
component.version: "v0.30.0"
component.revision: "v1"
spec:
# Deployment need:
# ----------------
# replicas: 1
# ----------------
selector:
matchLabels:
app: ingress-nginx
template:
metadata:
labels:
app: ingress-nginx
annotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
scheduler.alpha.kubernetes.io/critical-pod: ""
spec:
# DaemonSet need:
# ----------------
hostNetwork: true
# ----------------
serviceAccountName: nginx-ingress-controller
priorityClassName: system-node-critical
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- ingress-nginx
topologyKey: kubernetes.io/hostname
weight: 100
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: NotIn
values:
- virtual-kubelet
containers:
- name: nginx-ingress-controller
image: registry.cn-beijing.aliyuncs.com/acs/aliyun-ingress-controller:v0.30.0.2-9597b3685-aliyun
args:
- /nginx-ingress-controller
- --configmap=$(POD_NAMESPACE)/nginx-configuration
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --publish-service=$(POD_NAMESPACE)/nginx-ingress-lb
- --annotations-prefix=nginx.ingress.kubernetes.io
- --enable-dynamic-certificates=true
- --v=2
securityContext:
allowPrivilegeEscalation: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
runAsUser: 101
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
# resources:
# limits:
# cpu: "1"
# memory: 2Gi
# requests:
# cpu: "1"
# memory: 2Gi
volumeMounts:
- mountPath: /etc/localtime
name: localtime
readOnly: true
volumes:
- name: localtime
hostPath:
path: /etc/localtime
type: File
nodeSelector:
boge/ingress-controller-ready: "true"
tolerations:
- operator: Exists
initContainers:
- command:
- /bin/sh
- -c
- |
mount -o remount rw /proc/sys
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
sysctl -w fs.file-max=1048576
sysctl -w fs.inotify.max_user_instances=16384
sysctl -w fs.inotify.max_user_watches=524288
sysctl -w fs.inotify.max_queued_events=16384
image: registry.cn-beijing.aliyuncs.com/acs/busybox:v1.29.2
imagePullPolicy: Always
name: init-sysctl
securityContext:
privileged: true
procMount: Default
---
## Deployment need for aliyun'k8s:
#apiVersion: v1
#kind: Service
#metadata:
# annotations:
# service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id: "lb-xxxxxxxxxxxxxxxxxxx"
# service.beta.kubernetes.io/alibaba-cloud-loadbalancer-force-override-listeners: "true"
# labels:
# app: nginx-ingress-lb
# name: nginx-ingress-lb-local
# namespace: ingress-nginx
#spec:
# externalTrafficPolicy: Local
# ports:
# - name: http
# port: 80
# protocol: TCP
# targetPort: 80
# - name: https
# port: 443
# protocol: TCP
# targetPort: 443
# selector:
# app: ingress-nginx
# type: LoadBalancer
先部署aliyun-ingress-controller才能部署下面的一对测试
我们基于前面学到的deployment和service,来创建一个nginx的相应服务资源,保存为nginx.yaml:
注意:记得把前面测试的资源删除掉,以防冲突
生成文件后执行kubectl apply -f nginx.yaml文件内容如下:
apiVersion: v1
kind: Service
metadata:
labels:
app: nginx
name: nginx
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx # Deployment 如何跟 Service 关联的????
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
然后准备nginx的ingress配置,保留为nginx-ingress.yaml,并执行它:
kubectl apply -f nginx-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: nginx-ingress
spec:
rules:
- host: nginx.boge.com
http:
paths:
- backend:
serviceName: nginx # 与service做关联????
servicePort: 80
path: /
我们在其它节点上,加下本地hosts,来测试下效果
echo "172.24.75.119 nginx.boge.com" >> /etc/hosts
可以多在几个节点上添加尝试通过ingress访问
curl nginx.boge.com
回到主节点上,看下ingress-nginx的日志、这里要注意你pod运行的node来找pod
kubectl get pods -n ingress-nginx -o wide
多节点通过curl访问以后、可以看到访问日志
kubectl -n ingress-nginx logs --tail=2 nginx-ingress-controller-5lntz -o wide
如果是自建机房,我们通常会在至少2台node节点上运行有ingress-nginx的pod\
一般两台也够了
那么有必要在这两台node上面部署负载均衡软件做调度,来起到高可用的作用,这里我们用haproxy+keepalived,如果你的生产环境是在云上,假设是阿里云,那么你只需要购买一个负载均衡器SLB\
将运行有ingress-nginx的pod的节点服务器加到这个SLB的后端来,然后将请求域名和这个SLB的公网IP做好解析即可,
注意在每台node节点上有已经部署有了个haproxy软件,来转发apiserver的请求的,那么,我们只需要选取两台节点,部署keepalived软件并重新配置haproxy,来生成**VIP**达到ha的效果,这里我们选择在其中两台node节点(172.24.75.118、117)上部署
查看node上现在已有的haproxy配置:
这个配置在worker节点node3\4\5才有。在master是没有的
cat /etc/haproxy/haproxy.cfg
开始新增ingress端口的转发,修改haproxy配置如下(注意两台节点都记得修改):
vim /etc/haproxy/haproxy.cfg
在最后加
listen ingress-http
bind 0.0.0.0:80
mode tcp
option tcplog
option dontlognull
option dontlog-normal
balance roundrobin
server 172.24.75.117 172.24.75.117:80 check inter 2000 fall 2 rise 2 weight 1
server 172.24.75.118 172.24.75.118:80 check inter 2000 fall 2 rise 2 weight 1
server 172.24.75.119 172.24.75.119:80 check inter 2000 fall 2 rise 2 weight 1
listen ingress-https
bind 0.0.0.0:443
mode tcp
option tcplog
option dontlognull
option dontlog-normal
balance roundrobin
server 172.24.75.117 172.24.75.117:443 check inter 2000 fall 2 rise 2 weight 1
server 172.24.75.118 172.24.75.118:443 check inter 2000 fall 2 rise 2 weight 1
server 172.24.75.119 172.24.75.119:443 check inter 2000 fall 2 rise 2 weight 1
配置完后重启服务
systemctl restart haproxy.service
然后在两台node上分别安装keepalived并进行配置:
yum install -y keepalived
ip r 查看网卡接口 可能是eth0也可能是ens32,是哪个就写哪个
生成vip漂移
修改keepalived配置文件
vi /etc/keepalived/keepalived.conf
全部删除、在重新加上下面的、注意修改***eth0或者ens32*** ip r 查看网卡
注意改成自己的node IP
三个node的keepalive都要改、IP记得轮流换、在哪个机上unicast_src_ip就写哪个机的内网、交换
所以下面的配置三个机上都要换unicast_src_ip和unicast_peer
配置完成后重启keepalive
systemctl restart haproxy.service
systemctl restart keepalived.service
配置完成后查看VIP正在哪个node上:
ip a |grep 222
看到这次VIP 222在node04上
换一个node节点(worker)测试
将VIP和域名写入hosts
echo "172.24.75.222 nginx.boge.com" >> /etc/hosts
global_defs {
router_id lb-master
}
vrrp_script check-haproxy {
script "killall -0 haproxy"
interval 5
weight -60
}
vrrp_instance VI-kube-master {
state MASTER
priority 120
unicast_src_ip 172.24.75.115
unicast_peer {
172.24.75.116
172.24.75.114
}
dont_track_primary
interface eth0 # 注意这里的网卡名称修改成你机器真实的内网网卡名称,可用命令ip addr查看
virtual_router_id 111
advert_int 3
track_script {
check-haproxy
}
virtual_ipaddress {
172.24.75.222
}
}
因为http属于是明文传输数据不安全,在生产中我们通常会配置https加密通信,现在实战下Ingress的tls配置
# 这里我先自签一个https的证书
#1. 先生成私钥key
openssl genrsa -out tls.key 2048
#2.再基于key生成tls证书(注意:这里我用的*.boge.com,这是生成泛域名的证书,后面所有新增加的三级域名都是可以用这个证书的)
openssl req -new -x509 -key tls.key -out tls.cert -days 360 -subj /CN=*.boge.com
# 在K8s上创建tls的secret(注意默认ns是default)
kubectl create secret tls mytls --cert=tls.cert --key=tls.key
# 然后修改先的ingress的yaml配置
# cat nginx-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: / # 注意这里需要把进来到服务的请求重定向到/,这个和传统的nginx配置是一样的,不配会404
name: nginx-ingress
spec:
rules:
- host: nginx.boge.com
http:
paths:
- backend:
serviceName: nginx
servicePort: 80
path: /nginx # 注意这里的路由名称要是唯一的
- backend: # 从这里开始是新增加的
serviceName: web
servicePort: 80
path: /web # 注意这里的路由名称要是唯一的
tls: # 增加下面这段,注意缩进格式
- hosts:
- nginx.boge.com # 这里域名和上面的对应
secretName: mytls # 这是我先生成的secret
# 进行更新
# kubectl apply -f nginx-ingress.yaml
Level 7 k8s architect course HPA automatic horizontal scaling pod
HPA
What we usually use the most in production is to automatically expand the number of pods based on the cpu usage metrics of the service pod. Let’s use the production standard to test in practice (Note: Before using HPA, we must ensure the dns service and metrics of the K8s cluster The service is running normally, and the service we created needs to configure the distribution of metrics)
# pod内资源分配的配置格式如下:
# 默认可以只配置requests,但根据生产中的经验,建议把limits资源限制也加上,因为对K8s来说,只有这两个都配置了且配置的值都要一样,这个pod资源的优先级才是最高的,在node资源不够的情况下,首先是把没有任何资源分配配置的pod资源给干掉,其次是只配置了requests的,最后才是两个都配置的情况,仔细品品
resources:
limits: # 限制单个pod最多能使用1核(1000m 毫核)cpu以及2G内存
cpu: "1"
memory: 2Gi
requests: # 保证这个pod初始就能分配这么多资源
cpu: "1"
memory: 2Gi
我们先不做上面配置的改动,看看直接创建hpa会产生什么情况:
为deployment资源web创建hpa,pod数量上限3个,最低1个,在pod平均CPU达到50%后开始扩容
kubectl autoscale deployment web --max=3 --min=1 --cpu-percent=50
编辑配置vi web.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- image: nginx
name: nginx
resources:
limits: # 因为我这里是测试环境,所以这里CPU只分配50毫核(0.05核CPU)和20M的内存
cpu: "50m"
memory: 20Mi
requests: # 保证这个pod初始就能分配这么多资源
cpu: "50m"
memory: 20Mi
自动扩容
kubectl autoscale deployment web --max=3 --min=1 --cpu-percent=50
# 等待一会,可以看到相关的hpa信息(K8s上metrics服务收集所有pod资源的时间间隔大概在60s的时间)
# kubectl get hpa -w
# 我们启动一个临时pod,来模拟大量请求
# kubectl run -it --rm busybox --image=busybox -- sh
/ # while :;do wget -q -O- http://web;done
# kubectl get hpa -w
看到自动扩容了
The first section of the persistent storage of the 8th k8s architect course
Hello everyone, I am Bo Ge Ai O&M, how does K8s manage storage resources? Follow Boge to meet them!
Volume
emptyDir
Let's talk about emptyDir first. Its most practical use in production is to provide volume data sharing among multiple containers in a Pod.
Example usage of emptyDir
# 我们继续用上面的web服务的配置,在里面新增volume配置
# cat web.yaml
kubectl apply -f web.yaml
kubectl get svc
# 可以看到每次访问都是被写入当前最新时间的页面内容
[root@node-1 ~]# curl 10.68.229.231
# cat web.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- image: nginx
name: nginx
resources:
limits:
cpu: "50m"
memory: 20Mi
requests:
cpu: "50m"
memory: 20Mi
volumeMounts: # 准备将pod的目录进行卷挂载
- name: html-files # 自定个名称,容器内可以类似这样挂载多个卷
mountPath: "/usr/share/nginx/html"
- name: busybox # 在pod内再跑一个容器,每秒把当时时间写到nginx默认页面上
image: busybox
args:
- /bin/sh
- -c
- >
while :; do
if [ -f /html/index.html ];then
echo "[$(date +%F\ %T)] hello" > /html/index.html
sleep 1
else
touch /html/index.html
fi
done
volumeMounts:
- name: html-files # 注意这里的名称和上面nginx容器保持一样,这样才能相互进行访问
mountPath: "/html" # 将数据挂载到当前这个容器的这个目录下
volumes:
- name: html-files # 最后定义这个卷的名称也保持和上面一样
emptyDir: # 这就是使用emptyDir卷类型了
medium: Memory # 这里将文件写入内存中保存,这样速度会很快,配置为medium: "" 就是代表默认的使用本地磁盘空间来进行存储
sizeLimit: 10Mi # 因为内存比较珍贵,注意限制使用大小
Persistence Storage Section 8 k8s Architect Course Section 2 PV and PVC
Hello everyone, I am Boge Aiyunwei\\With pvc, we only need to consider how much capacity we need for volume mounting on K8s, and we don’t need to care about the underlying details such as what storage system is used for the real space , pv These only storage administrators should care about it.
K8s supports multiple types of pv. Here we use NFS commonly used in production as a demonstration (use NAS in the cloud). If the storage requirements are not too high in production, it is recommended to use NFS. It is relatively easy to solve. If you have performance requirements, you can look at rook's ceph and Rancher's Longhorn.
Start deploying NFS-SERVER
# 我们这里在10.0.1.201上安装(在生产中,大家要提供作好NFS-SERVER环境的规划)
yum -y install nfs-utils
# 创建NFS挂载目录
mkdir /nfs_dir
chown nobody.nobody /nfs_dir
# 修改NFS-SERVER配置
echo '/nfs_dir *(rw,sync,no_root_squash)' > /etc/exports
# 重启服务
systemctl restart rpcbind.service
systemctl restart nfs-utils.service
systemctl restart nfs-server.service
# 增加NFS-SERVER开机自启动
systemctl enable nfs-server.service
Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
# 验证NFS-SERVER是否能正常访问
showmount -e 10.0.1.201
Export list for 10.0.1.201:
/nfs_dir *
创建基于NFS的PV
首先在NFS-SERVER的挂载目录里面创建一个目录
mkdir /nfs_dir/pv1
接着准备好pv的yaml配置,保存为pv1.yaml
cat pv1.yaml
查看pv状态
kubectl get pv
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv1
labels:
type: test-claim # 这里建议打上一个独有的标签,方便在多个pv的时候方便提供pvc选择挂载
spec:
capacity:
storage: 1Gi # <---------- 1
accessModes:
- ReadWriteOnce # <---------- 2
persistentVolumeReclaimPolicy: Recycle # <---------- 3
storageClassName: nfs # <---------- 4
nfs:
path: /nfs_dir/pv1 # <---------- 5
server: 10.0.1.201
接着准备PVC的yaml,保存为pvc1.yaml
# cat pvc1.yaml
kubectl apply -f pvc1.yaml
查看pvc状态、已经bound
kubectl get pvc
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs
selector:
matchLabels:
type: test-claim
下面我们准备pod服务来挂载这个pvc
这里就以上面最开始演示用的nginx的deployment的yaml配置来作修改
cat nginx.yaml
kubectl apply -f nginx.yaml
kubectl get pod
kubectl get svc
curl nginxip
卷挂载后会把当前已经存在这个目录的文件给覆盖掉,这个和传统机器上的磁盘目录挂载道理是一样的
# 我们来自己创建一个index.html页面
# echo 'hello, world!' > /nfs_dir/pv1/index.html
# 再请求下看看,已经正常了
# curl 10.68.238.54
# 我们来手动删除这个nginx的pod,看下容器内的修改是否是持久的呢?
# kubectl delete pod nginx-569546db98-4nmmg
# 等待一会,等新的pod被创建好
# 再测试一下,可以看到,容器内的修改现在已经被持久化了
# 后面我们再想修改有两种方式,一个是exec进到pod内进行修改,还有一个是直接修改挂载在NFS目录下的文件
# echo 111 > /nfs_dir/pv1/index.html
apiVersion: v1
kind: Service
metadata:
labels:
app: nginx
name: nginx
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
volumeMounts: # 我们这里将nginx容器默认的页面目录挂载
- name: html-files
mountPath: "/usr/share/nginx/html"
volumes:
- name: html-files
persistentVolumeClaim: # 卷类型使用pvc,同时下面名称处填先创建好的pvc1
claimName: pvc1
***注意及时备份数据,注意及时备份数据,注意及时备份数据***
PVC以及PV的回收
# 这里删除时会一直卡着,我们按ctrl+c看看怎么回事
# kubectl delete pvc pvc1
persistentvolumeclaim "pvc1" deleted
^C
# 看下pvc发现STATUS是Terminating删除中的状态,我分析是因为服务pod还在占用这个pvc使用中
# kubectl get pvc
# 先删除这个pod
# kubectl delete pod nginx-569546db98-99qpq
# 再看先删除的pvc已经没有了
# kubectl get pvc
Persistent storage StorageClass of the 8th k8s architect course
Hello everyone, I am Boge Ai Operation and Maintenance, the third section of k8s persistent storage, to bring you an explanation of StorageClass dynamic storage.
我这是直接拿生产中用的实例来作演示,利用**********nfs-client-provisioner*******来生成一个基于nfs的StorageClass,部署配置yaml配置如下,保持为nfs-sc.yaml:
开始创建这个StorageClass:
kubectl apply -f nfs-sc.yaml
# 注意这个是在放kube-system的namespace下面,这里面放置一些偏系统类的服务
kubectl -n kube-system get pod -w
kubectl get sc
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: kube-system
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: nfs-provisioner-01
namespace: kube-system
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-provisioner-01
template:
metadata:
labels:
app: nfs-provisioner-01
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: jmgao1983/nfs-client-provisioner:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: nfs-provisioner-01 # 此处供应者名字供storageclass调用
- name: NFS_SERVER
value: 172.24.75.119 # 填入NFS的地址
- name: NFS_PATH
value: /nfs_dir # 填入NFS挂载的目录
volumes:
- name: nfs-client-root
nfs:
server: 172.24.75.119 # 填入NFS的地址
path: /nfs_dir # 填入NFS挂载的目录
---
###这里才是storageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-boge
provisioner: nfs-provisioner-01
# Supported policies: Delete、 Retain , default is Delete
reclaimPolicy: Retain
我们来基于StorageClass创建一个pvc,看看动态生成的pv是什么效果:
vim pvc-sc.yaml
kubectl apply -f pvc-sc.yaml
kubectl get pvc
kubectl get pv
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-sc
spec:
storageClassName: nfs-boge
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi
我们修改下nginx的yaml配置,将pvc的名称换成上面的pvc-sc:
修改最后一行就行
kubectl apply -f nginx.yaml
# 这里注意下,因为是动态生成的pv,所以它的目录基于是一串随机字符串生成的,这时我们直接进到pod内来创建访问页面
kubectl get pods
# kubectl exec -it nginx-57cdc6d9b4-n497g -- bash
root@nginx-57cdc6d9b4-n497g:/# echo 'storageClass used' > /usr/share/nginx/html/index.html
root@nginx-57cdc6d9b4-n497g:/# exit
# curl 10.68.238.54
storageClass used
# 我们看下NFS挂载的目录、发现已经出现
# ll /nfs_dir/
Stateful Service StatefulSet of Level 9 k8s Architect Course
Hello everyone, I am Boge Ai Operation and Maintenance. How does K8s manage stateful services? Follow Boge to meet them!
1、创建pv
-------------------------------------------
root@node1:~# cat web-pv.yaml
# mkdir -p /nfs_dir/{web-pv0,web-pv1}
apiVersion: v1
kind: PersistentVolume
metadata:
name: web-pv0
labels:
type: web-pv0
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: my-storage-class
nfs:
path: /nfs_dir/web-pv0
server: 10.0.1.201
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: web-pv1
labels:
type: web-pv1
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: my-storage-class
nfs:
path: /nfs_dir/web-pv1
server: 10.0.1.201
2、创建pvc(这一步可以省去让其自动创建,这里手动创建是为了让大家能更清楚在sts里面pvc的创建过程)
-------------------------------------------
这一步非常非常的关键,因为如果创建的PVC的名称和StatefulSet中的名称没有对应上,
那么StatefulSet中的Pod就肯定创建不成功.
3、创建Service 和 StatefulSet
-------------------------------------------
在上一步中我们已经创建了名为www-web-0的PVC了,接下来创建一个service和statefulset,
service的名称可以随意取,但是statefulset的名称已经定死了,为web,
并且statefulset中的volumeClaimTemplates_name必须为www,volumeMounts_name也必须为www。
只有这样,statefulset中的pod才能通过命名来匹配到PVC,否则会创建失败。
root@node1:~# cat web.yaml
apiVersion: v1
kind: Service
metadata:
name: web-headless
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: v1
kind: Service
metadata:
name: web
labels:
app: nginx
spec:
ports:
- port: 80
name: web
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 2 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-storage-class"
resources:
requests:
storage: 1Gi
One-time and timed tasks of the 10th k8s architect course
Original 2021-03-21 20:14 Boge loves operation and maintenance
Job, CronJob
cronjob
The above job is a one-time task,
apiVersion: batch/v1beta1 # <--------- 当前 CronJob 的 apiVersion
kind: CronJob # <--------- 当前资源的类型
metadata:
name: hello
spec:
schedule: "* * * * *" # <--------- schedule 指定什么时候运行 Job,其格式与 Linux crontab 一致,这里 * * * * * 的含义是每一分钟启动一次
jobTemplate: # <--------- 定义 Job 的模板,格式与前面 Job 一致
spec:
template:
spec:
containers:
- name: hello
image: busybox
command: ["echo","boge like cronjob."]
restartPolicy: OnFailure
RBAC role access control of the 11th k8s architect course
Original 2021-03-22 21:30 Boge loves operation and maintenance
RBAC role-based access control
control access
The first type of actual combat here is temporarily explained with the early helm2
Titler is equivalent to the server of helm. It needs permission to create various resources in K8s. During the initial installation and use, if RBAC permission is not configured, we will see the following error:
root@node1:~# helm install stable/mysql
Error: no available release name found
管理员权限查看
cat /root/.kube/config
At this time, we can quickly solve this problem by creating a ClusterRole with the highest authority that comes with K8s associated with sa (it is not recommended to do this in production, as the authority is too high and there are security risks. This is the same as the root management account of linux, which is generally It is recommended to control account permissions through sudo)
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
In the second category, I will directly use the complete script I implemented in production to explain and practice. I believe it will bring you a brand new learning experience and you will be able to master them quickly.
***这是超大的福利***
0.用法、生成后权限查看
1.某个namespace所有权限的脚本(新建namespace)
2.创建已经存在的namespace的权限的脚本
3.创建只读权限
4.创建多集群融合配置(把脚本拿到每个集群跑生成配置)
0.用法 sh 1.sh
Use 1.sh NAMESPACE ENDPOINT
SH 1.SH 命名空间名称 访问地址
[root@node01 mnt]# bash 1.sh
All namespaces is here:
default
ingress-nginx
kube-node-lease
kube-public
kube-system
endpoint server if local network you can use https://172.24.75.119:6443
Use 1.sh NAMESPACE ENDPOINT
结果:
[root@node01 mnt]# sh 1.sh webbobo https://172.24.75.119:6443
All namespaces is here:
default
ingress-nginx
kube-node-lease
kube-public
kube-system
webbobo
endpoint server if local network you can use https://172.24.75.119:6443
webbobo
namespace: webbobo was exist.
serviceaccount/webbobo-user created
Warning: rbac.authorization.k8s.io/v1beta1 Role is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 Role
role.rbac.authorization.k8s.io/webbobo-user-full-access created
Warning: rbac.authorization.k8s.io/v1beta1 RoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 RoleBinding
rolebinding.rbac.authorization.k8s.io/webbobo-user-view created
resourcequota/webbobo-compute-resources created
Name: webbobo-compute-resources
Namespace: webbobo
Resource Used Hard
-------- ---- ----
limits.cpu 0 2
limits.memory 0 4Gi
persistentvolumeclaims 0 5
pods 0 10
requests.cpu 0 1
requests.memory 0 2Gi
services 0 10
查看访问配置权限
cat ./kube_config/webbobo.kube.conf
使用:
kubectl --kubeconfig=./kube_config/web.kube.conf get nodes
#Forbidden代表权限不足
kubectl --kubeconfig=./kube_config/web.kube.conf get pods
#可用
kubectl --kubeconfig=./kube_config/webbobo.kube.conf get pods
kubectl get sa -n webbobo
kubectl get role -n webbobo
kubectl get rolebings -n webbobo
kubectl get delete -n webbobo
1.创建对指定(新建)namespace有***所有权限***的kube-config(新建)
#!/bin/bash
#
# This Script based on https://jeremievallee.com/2018/05/28/kubernetes-rbac-namespace-user.html
# K8s'RBAC doc: https://kubernetes.io/docs/reference/access-authn-authz/rbac
# Gitlab'CI/CD doc: hhttps://docs.gitlab.com/ee/user/permissions.html#running-pipelines-on-protected-branches
#
# In honor of the remarkable Windson
BASEDIR="$(dirname "$0")"
folder="$BASEDIR/kube_config"
echo -e "All namespaces is here: \n$(kubectl get ns|awk 'NR!=1{print $1}')"
echo "endpoint server if local network you can use $(kubectl cluster-info |awk '/Kubernetes/{print $NF}')"
#打印api的地址
namespace=$1
endpoint=$(echo "$2" | sed -e 's,https\?://,,g')
#能够访问到的api地址
if [[ -z "$endpoint" || -z "$namespace" ]]; then
echo "Use "$(basename "$0")" NAMESPACE ENDPOINT";
exit 1;
fi
#缺少参数则退出
if ! kubectl get ns|awk 'NR!=1{print $1}'|grep -w "$namespace";then kubectl create ns "$namespace";else echo "namespace: $namespace was exist.";exit 1 ;fi
#命名空间是否存在
#1.创建账号
echo "---
apiVersion: v1
kind: ServiceAccount
metadata:
name: $namespace-user
namespace: $namespace
---
#2.创建角色Role对指定的namespace生效、CLU role对生效
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: $namespace-user-full-access
namespace: $namespace
rules:
#所有权限用*代替
- apiGroups: ['', 'extensions', 'apps', 'metrics.k8s.io']
resources: ['*']
verbs: ['*']
- apiGroups: ['batch']
resources:
- jobs
- cronjobs
verbs: ['*']
---
#3.角色绑定
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: $namespace-user-view
namespace: $namespace
subjects:
- kind: ServiceAccount
name: $namespace-user
namespace: $namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: $namespace-user-full-access
---
# https://kubernetes.io/zh/docs/concepts/policy/resource-quotas/
apiVersion: v1
kind: ResourceQuota
metadata:
name: $namespace-compute-resources
namespace: $namespace
spec:
#做资源限制
hard:
pods: "10"
services: "10"
persistentvolumeclaims: "5"
requests.cpu: "1"
requests.memory: 2Gi
limits.cpu: "2"
limits.memory: 4Gi" | kubectl apply -f -
kubectl -n $namespace describe quota $namespace-compute-resources
mkdir -p $folder
#基于账号提取密钥名称、Token、证书
tokenName=$(kubectl get sa $namespace-user -n $namespace -o "jsonpath={.secrets[0].name}")
token=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data.token}" | base64 --decode)
certificate=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data['ca\.crt']}")
echo "apiVersion: v1
kind: Config
preferences: {}
clusters:
- cluster:
certificate-authority-data: $certificate
server: https://$endpoint
name: $namespace-cluster
users:
- name: $namespace-user
user:
as-user-extra: {}
client-key-data: $certificate
token: $token
contexts:
- context:
cluster: $namespace-cluster
namespace: $namespace
user: $namespace-user
name: $namespace
current-context: $namespace" > $folder/$namespace.kube.conf
2.创建对指定namespace有所有权限的kube-config(在已有的namespace中创建)
#!/bin/bash
BASEDIR="$(dirname "$0")"
folder="$BASEDIR/kube_config"
echo -e "All namespaces is here: \n$(kubectl get ns|awk 'NR!=1{print $1}')"
echo "endpoint server if local network you can use $(kubectl cluster-info |awk '/Kubernetes/{print $NF}')"
namespace=$1
endpoint=$(echo "$2" | sed -e 's,https\?://,,g')
if [[ -z "$endpoint" || -z "$namespace" ]]; then
echo "Use "$(basename "$0")" NAMESPACE ENDPOINT";
exit 1;
fi
echo "---
apiVersion: v1
kind: ServiceAccount
metadata:
name: $namespace-user
namespace: $namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: $namespace-user-full-access
namespace: $namespace
rules:
- apiGroups: ['', 'extensions', 'apps', 'metrics.k8s.io']
resources: ['*']
verbs: ['*']
- apiGroups: ['batch']
resources:
- jobs
- cronjobs
verbs: ['*']
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: $namespace-user-view
namespace: $namespace
subjects:
- kind: ServiceAccount
name: $namespace-user
namespace: $namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: $namespace-user-full-access" | kubectl apply -f -
mkdir -p $folder
tokenName=$(kubectl get sa $namespace-user -n $namespace -o "jsonpath={.secrets[0].name}")
token=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data.token}" | base64 --decode)
certificate=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data['ca\.crt']}")
echo "apiVersion: v1
kind: Config
preferences: {}
clusters:
- cluster:
certificate-authority-data: $certificate
server: https://$endpoint
name: $namespace-cluster
users:
- name: $namespace-user
user:
as-user-extra: {}
client-key-data: $certificate
token: $token
contexts:
- context:
cluster: $namespace-cluster
namespace: $namespace
user: $namespace-user
name: $namespace
current-context: $namespace" > $folder/$namespace.kube.conf
3.创建只读权限的、主要就是改了verbs
#!/bin/bash
BASEDIR="$(dirname "$0")"
folder="$BASEDIR/kube_config"
echo -e "All namespaces is here: \n$(kubectl get ns|awk 'NR!=1{print $1}')"
echo "endpoint server if local network you can use $(kubectl cluster-info |awk '/Kubernetes/{print $NF}')"
namespace=$1
endpoint=$(echo "$2" | sed -e 's,https\?://,,g')
if [[ -z "$endpoint" || -z "$namespace" ]]; then
echo "Use "$(basename "$0")" NAMESPACE ENDPOINT";
exit 1;
fi
echo "---
apiVersion: v1
kind: ServiceAccount
metadata:
name: $namespace-user-readonly
namespace: $namespace
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: $namespace-user-readonly-access
namespace: $namespace
rules:
- apiGroups: ['', 'extensions', 'apps', 'metrics.k8s.io']
resources: ['pods', 'pods/log']
verbs: ['get', 'list', 'watch']
- apiGroups: ['batch']
resources: ['jobs', 'cronjobs']
verbs: ['get', 'list', 'watch']
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: $namespace-user-view-readonly
namespace: $namespace
subjects:
- kind: ServiceAccount
name: $namespace-user-readonly
namespace: $namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: $namespace-user-readonly-access" | kubectl apply -f -
mkdir -p $folder
tokenName=$(kubectl get sa $namespace-user-readonly -n $namespace -o "jsonpath={.secrets[0].name}")
token=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data.token}" | base64 --decode)
certificate=$(kubectl get secret $tokenName -n $namespace -o "jsonpath={.data['ca\.crt']}")
echo "apiVersion: v1
kind: Config
preferences: {}
clusters:
- cluster:
certificate-authority-data: $certificate
server: https://$endpoint
name: $namespace-cluster-readonly
users:
- name: $namespace-user-readonly
user:
as-user-extra: {}
client-key-data: $certificate
token: $token
contexts:
- context:
cluster: $namespace-cluster-readonly
namespace: $namespace
user: $namespace-user-readonly
name: $namespace
current-context: $namespace" > $folder/$namespace-readonly.kube.conf
4.大招---多集群融合配置
#!/bin/bash
# describe: create k8s cluster all namespaces resources with readonly clusterrole, no exec 、delete ...
# look system default to study:
# kubectl describe clusterrole view
# restore all change:
#kubectl -n kube-system delete sa all-readonly-${clustername}
#kubectl delete clusterrolebinding all-readonly-${clustername}
#kubectl delete clusterrole all-readonly-${clustername}
clustername=$1
Help(){
echo "Use "$(basename "$0")" ClusterName(example: k8s1|k8s2|k8s3|delk8s1|delk8s2|delk8s3|3in1)";
exit 1;
}
if [[ -z "${clustername}" ]]; then
Help
fi
case ${clustername} in
#二进制安装时指定名称
#查看命令kubectl config get-contexts
k8s1)
endpoint="https://x.x.x.x:123456"
;;
k8s2)
endpoint="https://x.x.x.x:123456"
;;
k8s3)
endpoint="https://x.x.x.x:123456"
;;
delk8s1)
kubectl -n kube-system delete sa all-readonly-k8s1
kubectl delete clusterrolebinding all-readonly-k8s1
kubectl delete clusterrole all-readonly-k8s1
echo "${clustername} successful."
exit 0
;;
delk8s2)
kubectl -n kube-system delete sa all-readonly-k8s2
kubectl delete clusterrolebinding all-readonly-k8s2
kubectl delete clusterrole all-readonly-k8s2
echo "${clustername} successful."
exit 0
;;
delk8s3)
kubectl -n kube-system delete sa all-readonly-k8s3
kubectl delete clusterrolebinding all-readonly-k8s3
kubectl delete clusterrole all-readonly-k8s3
echo "${clustername} successful."
exit 0
;;
3in1)
KUBECONFIG=./all-readonly-k8s1.conf:all-readonly-k8s2.conf:all-readonly-k8s3.conf kubectl config view --flatten > ./all-readonly-3in1.conf
kubectl --kubeconfig=./all-readonly-3in1.conf config use-context "k8s3"
kubectl --kubeconfig=./all-readonly-3in1.conf config set-context "k8s3" --namespace="default"
kubectl --kubeconfig=./all-readonly-3in1.conf config get-contexts
echo -e "\n\n\n"
cat ./all-readonly-3in1.conf |base64 -w 0
exit 0
;;
*)
Help
esac
echo "---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: all-readonly-${clustername}
rules:
- apiGroups:
- ''
resources:
- configmaps
- endpoints
- persistentvolumes
- persistentvolumeclaims
- pods
- replicationcontrollers
- replicationcontrollers/scale
- serviceaccounts
- services
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- bindings
- events
- limitranges
- namespaces/status
- pods/log
- pods/status
- replicationcontrollers/status
- resourcequotas
- resourcequotas/status
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- namespaces
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- controllerrevisions
- daemonsets
- deployments
- deployments/scale
- replicasets
- replicasets/scale
- statefulsets
- statefulsets/scale
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- deployments/scale
- ingresses
- networkpolicies
- replicasets
- replicasets/scale
- replicationcontrollers/scale
verbs:
- get
- list
- watch
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- get
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- get
- list
- watch
- apiGroups:
- metrics.k8s.io
resources:
- pods
verbs:
- get
- list
- watch" | kubectl apply -f -
kubectl -n kube-system create sa all-readonly-${clustername}
kubectl create clusterrolebinding all-readonly-${clustername} --clusterrole=all-readonly-${clustername} --serviceaccount=kube-system:all-readonly-${clustername}
tokenName=$(kubectl -n kube-system get sa all-readonly-${clustername} -o "jsonpath={.secrets[0].name}")
token=$(kubectl -n kube-system get secret $tokenName -o "jsonpath={.data.token}" | base64 --decode)
certificate=$(kubectl -n kube-system get secret $tokenName -o "jsonpath={.data['ca\.crt']}")
echo "apiVersion: v1
kind: Config
preferences: {}
clusters:
- cluster:
certificate-authority-data: $certificate
server: $endpoint
name: all-readonly-${clustername}-cluster
users:
- name: all-readonly-${clustername}
user:
as-user-extra: {}
client-key-data: $certificate
token: $token
contexts:
- context:
cluster: all-readonly-${clustername}-cluster
user: all-readonly-${clustername}
name: ${clustername}
current-context: ${clustername}" > ./all-readonly-${clustername}.conf
The business log collection of the 12th level k8s architect course is introduced in the previous section and practical in the next section
business log
Original 2021-03-24 21:30 Boge loves operation and maintenance
elasticsearch log-pilot kibana 、 prometheus-operator grafana webhook
log collection
Most of the courses on the market now use EFK as the log solution for the K8s project, which includes three components: Elasticsearch, Fluentd (filebeat), Kibana; Elasticsearch is a log storage and log search engine, and Fluentd is responsible for integrating the k8s cluster The logs are sent to Elasticsearch, and Kibana is a visual interface to view and retrieve data stored in Elasticsearch.
However, according to the actual usage in production, it has the following disadvantages:
1. The log collection system EFK starts a fluentd pod in the form of daemonset on each kubernetes NODE node to collect logs on the NODE node, such as container logs (/var/log/containers/*.log), but inside It cannot be subdivided, and both wanted and unwanted things are collected, which will lead to greater disk IO pressure and troublesome log filtering.
2. It is impossible to collect the business logs in the corresponding POD. The first point above can only collect the stdout (printed logs, not business logs) logs of the pod, but if there are business logs that need to be collected in the pod, such as /tmp/ in the pod datalog/*.log, then EFK is powerless. It can only start multiple containers (filebeat) in the pod to collect logs in the container, but this will bring about the loss of pod multi-container performance. This will be done next Talk about it in detail.
3. The acquisition rate performance of fluentd is low, only less than 1/10 of the performance of filebeat.
Based on this, I discovered through research that Alibaba's open source smart container collection tool Log-Pilot, github address:
https://github.com/AliyunContainerService/log-pilot
The following is a detailed comparison of the log collection methods of the sidecar mode and log-pilot:
The sidecar mode requires us to attach a logging container to each Pod to collect the logs of the internal container of the Pod. Generally, a shared volume is used, but for this mode, an obvious problem is that it takes up There are many resources, especially when the cluster size is relatively large,
Another mode is the Node mode. In this mode, we only need to deploy one logging container on each Node node to collect logs from all containers of this Node. Compared with the previous mode, the most obvious advantage is that it occupies less resources. It also shows more obvious advantages when the cluster size is relatively large. At the same time, this is also a mode recommended by the community.
After various tests, log-pilot is very intrusive to existing business pods. It only needs to pass in a few lines of env environment variables in the original pod to collect logs related to this pod. The backend reception has been tested The tools used include logstash, elasticsearch, kafka, redis, and file, all of which are OK. Next, the entire log collection environment will be deployed.
1.我们这里用一个tomcat服务来模拟业务服务,用log-pilot分别收集它的stdout以及容器内的业务数据日志文件到指定后端存储(这里分别以elasticsearch、kafka的这两种企业常用的接收工具来做示例)
1.只传个环境变量就能收集日志了
准备好相应的yaml配置
vim tomcat-test.yaml
kubectl apply -f tomcat-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: tomcat
name: tomcat
spec:
replicas: 1
selector:
matchLabels:
app: tomcat
template:
metadata:
labels:
app: tomcat
spec:
tolerations:
- key: "node-role.kubernetes.io/master"
effect: "NoSchedule"
containers:
- name: tomcat
image: "tomcat:7.0"
env: # 注意点一,添加相应的环境变量(下面收集了两块日志1、stdout 2、/usr/local/tomcat/logs/catalina.*.log)
- name: aliyun_logs_tomcat-syslog # 如日志发送到es,那index名称为 tomcat-syslog
value: "stdout"
- name: aliyun_logs_tomcat-access # 如日志发送到es,那index名称为 tomcat-access
value: "/usr/local/tomcat/logs/catalina.*.log"
volumeMounts:
# 注意点二,对pod内要收集的业务日志目录需要进行共享,可以收集多个目录下的日志文件
#挂载
- name: tomcat-log
mountPath: /usr/local/tomcat/logs
volumes:
- name: tomcat-log
emptyDir: {
}
2.创建es6.yaml es和kibana 是一起的
2.3.4是一起的
接收工具ES、es和kibana的版本号要一样
vim elasticsearch.6.8.13-statefulset.yaml
kubectl apply -f elasticsearch.6.8.13-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: elasticsearch-logging
version: v6.8.13
name: elasticsearch-logging
# namespace: logging
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: elasticsearch-logging
version: v6.8.13
serviceName: elasticsearch-logging
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: v6.8.13
spec:
#nodeSelector: #测试不用这个节点选择选项
# esnode: "true" ## 注意给想要运行到的node打上相应labels
containers:
- env:
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: cluster.name
value: elasticsearch-logging-0
- name: ES_JAVA_OPTS
value: "-Xms256m -Xmx256m"
image: elastic/elasticsearch:6.8.13
name: elasticsearch-logging
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-logging
dnsConfig:
options:
- name: single-request-reopen
initContainers:
- command:
- /bin/sysctl
- -w
- vm.max_map_count=262144
image: busybox
imagePullPolicy: IfNotPresent
name: elasticsearch-logging-init
resources: {
}
securityContext:
privileged: true
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
- name: elasticsearch-logging
mountPath: /usr/share/elasticsearch/data
volumes:
- name: elasticsearch-logging
hostPath:
path: /esdata
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: elasticsearch-logging
name: elasticsearch
# namespace: logging
spec:
ports:
- port: 9200
protocol: TCP
targetPort: db
selector:
k8s-app: elasticsearch-logging
type: ClusterIP
3.创建kibana、2.3.4是一起的
vim kibana.6.8.13.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
# namespace: logging
labels:
app: kibana
spec:
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- name: kibana
image: elastic/kibana:6.8.13
resources:
limits:
cpu: 1000m
requests:
cpu: 100m
env:
- name: ELASTICSEARCH_URL
value: http://elasticsearch:9200
#elasticsearch的地址就是elasticsearch的svc地址
ports:
- containerPort: 5601
---
apiVersion: v1
kind: Service
metadata:
name: kibana
# namespace: logging
labels:
app: kibana
spec:
ports:
- port: 5601
protocol: TCP
targetPort: 5601
type: ClusterIP
selector:
app: kibana
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: kibana
# namespace: logging
spec:
rules:
- host: kibana.boge.com
###域名自己换
http:
paths:
- path: /
backend:
serviceName: kibana
servicePort: 5601
![mag20220814191841638](C:\Users\123\AppData\Roaming\Typora\typora-user-images\image-20220814191841638.png
4.部署log-pilot.yml\\\\Daemonset---每个node都会跑一个日志代理、2.3.4是一起的
收集日志标准tomcat-test里传入的env环境变量
配置本地域名解析kubectl get ingress
echo "172.24.75.119 kibana.boge.com" >> /etc/hosts
查看kibana的svc是内网ClusterIP,要把他变成NodePort才能访问修改方法见上面网址
kubectl edit svc kibana
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP 10.68.10.41 <none> 9200/TCP 45m
kibana NodePort 10.68.127.151 <none> 5601:30752/TCP 43m
vim log-pilot.yml # 后端输出的elasticsearch
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: log-pilot
labels:
app: log-pilot
# 设置期望部署的namespace
# namespace: ns-elastic
spec:
selector:
matchLabels:
app: log-pilot
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: log-pilot
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
# 是否允许部署到Master节点上
#tolerations:
#- key: node-role.kubernetes.io/master
# effect: NoSchedule
containers:
- name: log-pilot
# 版本请参考https://github.com/AliyunContainerService/log-pilot/releases
image: registry.cn-hangzhou.aliyuncs.com/acs/log-pilot:0.9.7-filebeat
resources:
limits:
memory: 500Mi
requests:
cpu: 200m
memory: 200Mi
env:
- name: "NODE_NAME"
valueFrom:
fieldRef:
fieldPath: spec.nodeName
##--------------------------------
# - name: "LOGGING_OUTPUT"
# value: "logstash"
# - name: "LOGSTASH_HOST"
# value: "logstash-g1"
# - name: "LOGSTASH_PORT"
# value: "5044"
##--------------------------------
- name: "LOGGING_OUTPUT"
value: "elasticsearch"
## 请确保集群到ES网络可达
- name: "ELASTICSEARCH_HOSTS"
value: "elasticsearch:9200"
## 配置ES访问权限
#- name: "ELASTICSEARCH_USER"
# value: "{es_username}"
#- name: "ELASTICSEARCH_PASSWORD"
# value: "{es_password}"
##--------------------------------
## https://github.com/AliyunContainerService/log-pilot/blob/master/docs/filebeat/docs.md
## to file need configure 1
# - name: LOGGING_OUTPUT
# value: file
# - name: FILE_PATH
# value: /tmp
# - name: FILE_NAME
# value: filebeat.log
volumeMounts:
- name: sock
mountPath: /var/run/docker.sock
- name: root
mountPath: /host
readOnly: true
- name: varlib
mountPath: /var/lib/filebeat
- name: varlog
mountPath: /var/log/filebeat
- name: localtime
mountPath: /etc/localtime
readOnly: true
## to file need configure 2
# - mountPath: /tmp
# name: mylog
livenessProbe:
failureThreshold: 3
exec:
command:
- /pilot/healthz
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 2
securityContext:
capabilities:
add:
- SYS_ADMIN
terminationGracePeriodSeconds: 30
volumes:
- name: sock
hostPath:
path: /var/run/docker.sock
- name: root
hostPath:
path: /
- name: varlib
hostPath:
path: /var/lib/filebeat
type: DirectoryOrCreate
- name: varlog
hostPath:
path: /var/log/filebeat
type: DirectoryOrCreate
- name: localtime
hostPath:
path: /etc/localtime
## to file need configure 3
# - hostPath:
# path: /tmp/mylog
# type: ""
# name: mylog
5.6 are together
5.独立的、、、部署log-pilot2-kafka.yaml----后端输出带哦kafka---node01
资源不足可以把es和kibana删掉
记得该ip
vi log-pilot2-kafka.yaml #后端输出到kafka
:%s+configMapKeyRee+configMapKeyRef+g批量替换
---
apiVersion: v1
kind: ConfigMap
metadata:
name: log-pilot2-configuration
#namespace: ns-elastic
data:
logging_output: "kafka"
kafka_brokers: "172.24.75.115:9092"
kafka_version: "0.10.0"
# configure all valid topics in kafka
# when disable auto-create topic
kafka_topics: "tomcat-syslog,tomcat-access"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: log-pilot2
#namespace: ns-elastic
labels:
k8s-app: log-pilot2
spec:
selector:
matchLabels:
k8s-app: log-pilot2
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: log-pilot2
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: log-pilot2
#
# wget https://github.com/AliyunContainerService/log-pilot/archive/v0.9.7.zip
# unzip log-pilot-0.9.7.zip
# vim ./log-pilot-0.9.7/assets/filebeat/config.filebeat
# ...
# output.kafka:
# hosts: [$KAFKA_BROKERS]
# topic: '%{[topic]}'
# codec.format:
# string: '%{[message]}'
# ...
image: registry.cn-hangzhou.aliyuncs.com/acs/log-pilot:0.9.7-filebeat
env:
- name: "LOGGING_OUTPUT"
valueFrom:
configMapKeyRef:
name: log-pilot2-configuration
key: logging_output
- name: "KAFKA_BROKERS"
valueFrom:
configMapKeyRef:
name: log-pilot2-configuration
key: kafka_brokers
- name: "KAFKA_VERSION"
valueFrom:
configMapKeyRef:
name: log-pilot2-configuration
key: kafka_version
- name: "NODE_NAME"
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: sock
mountPath: /var/run/docker.sock
- name: logs
mountPath: /var/log/filebeat
- name: state
mountPath: /var/lib/filebeat
- name: root
mountPath: /host
readOnly: true
- name: localtime
mountPath: /etc/localtime
# configure all valid topics in kafka
# when disable auto-create topic
- name: config-volume
mountPath: /etc/filebeat/config
securityContext:
capabilities:
add:
- SYS_ADMIN
terminationGracePeriodSeconds: 30
volumes:
- name: sock
hostPath:
path: /var/run/docker.sock
type: Socket
- name: logs
hostPath:
path: /var/log/filebeat
type: DirectoryOrCreate
- name: state
hostPath:
path: /var/lib/filebeat
type: DirectoryOrCreate
- name: root
hostPath:
path: /
type: Directory
- name: localtime
hostPath:
path: /etc/localtime
type: File
# kubelet sync period
- name: config-volume
configMap:
name: log-pilot2-configuration
items:
- key: kafka_topics
path: kafka_topics
6.部署kafka集群---node04
把docker-copose装好
scp /etc/kubeasz/bin/docker-compose 172.24.75.115:/bin
cd /tmp
mkdir /kafka
# 部署前准备
# 0. 先把代码pull到本地
# https://github.com/wurstmeister/kafka-docker
# 修改docker-compose.yml为:
#——------------------------------
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
#build: .
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: 172.24.75.114 # docker运行的机器IP
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /nfs_storageclass/kafka:/kafka
#——------------------------------
6.补充创建kafka后的操作--node04
-----运行docker-compose
# 1. docker-compose setup:
# docker-compose up -d
Recreating kafka-docker-compose_kafka_1 ... done
Starting kafka-docker-compose_zookeeper_1 ... done
# 2. result look:---看到zook和kafka都起来
# docker-compose ps
Name Command State Ports
--------------------------------------------------------------------------------------------------------------------
kafka-docker-compose_kafka_1 start-kafka.sh Up 0.0.0.0:9092->9092/tcp
kafka-docker-compose_zookeeper_1 /bin/sh -c /usr/sbin/sshd ... Up 0.0.0.0:2181->2181/tcp, 22/tcp,
2888/tcp, 3888/tcp
# 3. run test-docker进入kafka
bash-4.4# docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -e HOST_IP=172.24.75.115 -e ZK=172.24.75.115:2181 -i -t wurstmeister/kafka /bin/bash
# 4. list topic---列出kafka---topic
bash-4.4# kafka-topics.sh --zookeeper 172.24.75.115:2181 --list
tomcat-access
tomcat-syslog
# 5. consumer topic data:
bash-4.4# kafka-console-consumer.sh --bootstrap-server 172.24.75.115:9092 --topic tomcat-access --from-beginning
Private Mirror Warehouse-Harbor of the 13th k8s architect course
Original 2021-03-26 17:36 Boge loves operation and maintenance
Hello everyone, I am Boge, who loves operation and maintenance. The suggestion is not to deploy harbor on k8s. Here, Boge directly installs harbor in the first offline way.
# 离线形式安装harbor私有镜像仓库---node04
1.创建目录及下载harbor离线包----node04
mkdir /data && cd /data
wget https://github.com/goharbor/harbor/releases/download/v2.2.0/harbor-offline-installer-v2.2.0.tgz---------离线包自己下载了保存好
tar xf harbor-offline-installer-v2.2.0.tgz && rm harbor-offline-installer-v2.2.0.tgz
## 修改harbor配置
cd /data/harbor
cp harbor.yml.tmpl harbor.yml
5 hostname: harbor.boge.com
17 certificate: /data/harbor/ssl/tls.cert
18 private_key: /data/harbor/ssl/tls.key
34 harbor_admin_password: boge666
2.创建harbor访问域名证书
mkdir /data/harbor/ssl && cd /data/harbor/ssl
openssl genrsa -out tls.key 2048
openssl req -new -x509 -key tls.key -out tls.cert -days 360 -subj /CN=*.boge.com
3.准备好单机编排工具`docker-compose`
> 从二进制安装k8s项目的bin目录拷贝过来-----从node01拷---->node04
scp /etc/kubeasz/bin/docker-compose 172.24.75.115:/usr/bin/
> 也可以在docker官方进行下载
https://docs.docker.com/compose/install/
4. 开始安装
./install.sh
5. 推送镜像到harbor
每个节点都要加hosts
##node04的ip(跑harbor的ip)
echo '172.24.75.141 harbor.boge.com' >> /etc/hosts
先打标签、在登陆push
docker login harbor.boge.com
docker tag nginx:latest harbor.boge.com/library/nginx:latest
docker push harbor.boge.com/library/nginx:1.18.0-alpine
###浏览器进入harbor控制台:---node04的ip+/harbor
47.103.13.111/harbor
部署harbor公网地址/harbor
6.在其他节点上面拉取harbor镜像
> 在集群每个 node 节点进行如下配置
> ssh to 10.0.1.201(centos7)
7.所有节点都要做
每个节点都要加hosts
##node04的ip(跑harbor的ip)
echo '172.24.75.141 harbor.boge.com' >> /etc/hosts
mkdir -p /etc/docker/certs.d/harbor.boge.com
# node04的ip--跑harbor的ip
scp 172.24.75.141:/data/harbor/ssl/tls.cert /etc/docker/certs.d/harbor.boge.com/ca.crt
docker pull harbor.boge.com/library/nginx:latest
8 看看harbor正常不---不正常就重启harbor---node04
docker-compose ps
docker-compose down -v
docker-compose up -d
docker ps|grep harbor
## 附(引用自 https://github.com/easzlab/kubeasz):
containerd配置信任harbor证书
在集群每个 node 节点进行如下配置(假设ca.pem为自建harbor的CA证书)
ubuntu 1604:
cp ca.pem /usr/share/ca-certificates/harbor-ca.crt
echo harbor-ca.crt >> /etc/ca-certificates.conf
update-ca-certificates
CentOS 7:
cp ca.pem /etc/pki/ca-trust/source/anchors/harbor-ca.crt
update-ca-trust
上述配置完成后,重启 containerd 即可 systemctl restart containerd
8.harbor的备份:
/data挂载独立SSD
The 14th level k8s architect course business Prometheus monitoring practice one
Original 2021-03-28 20:30 Boge loves operation and maintenance
service monitoring
In traditional services, we usually go to zabbix, open-falcon, and netdata to monitor services, but for the current mainstream K8s platform, since the service pod will be scheduled to run on any machine, and the pod will be It is automatically restarted, and we also need a better automatic service discovery function to realize automatic access to service alarms and more efficient operation and maintenance alarms. Here we need to use K8s monitoring to implement Prometheus, which is based on Google's internal monitoring Open source implementation of the system.
How Prometheus Operator works
In Kubernetes, we use Deployment, DamenSet, and StatefulSet to manage application workload, use Service and Ingress to manage application access methods, and use ConfigMap and Secret to manage application configuration. Our creation, update, and deletion of these resources in the cluster will be converted into events (Event), and the Controller Manager of Kubernetes is responsible for listening to these events and triggering corresponding tasks to meet user expectations. In this way, we become declarative. Users only need to care about the final state of the application, and Kubernetes will help us complete the rest. In this way, the complexity of configuration management of the application can be greatly simplified.
Practical operation part one
Deploy Prometheus Operator in K8s cluster
Here we use prometheus-operator to install the entire set of prometheus services. It is recommended to use the master branch directly, which is also the official recommendation
https://github.com/prometheus-operator/kube-prometheus
start installation
Installation package and offline image package download
https://cloud.189.cn/t/bM7f2aANnMVb (access code: 0nsj)
1. 解压下载的代码包---node1
cd /mnt/
unzip kube-prometheus-master.zip
rm -f kube-prometheus-master.zip && cd kube-prometheus-master
2. 每个节点导入镜像包离线镜像包都导入
docker load -i xxx.tar
这里建议先看下有哪些镜像,便于在下载镜像快的节点上先收集好所有需要的离线docker镜像
# find ./ -type f |xargs grep 'image: '|sort|uniq|awk '{print $3}'|grep ^[a-zA-Z]|grep -Evw 'error|kubeRbacProxy'|sort -rn|uniq
quay.io/prometheus/prometheus:v2.22.1
quay.io/prometheus-operator/prometheus-operator:v0.43.2
quay.io/prometheus/node-exporter:v1.0.1
quay.io/prometheus/alertmanager:v0.21.0
quay.io/fabxc/prometheus_demo_service
quay.io/coreos/kube-state-metrics:v1.9.7
quay.io/brancz/kube-rbac-proxy:v0.8.0
grafana/grafana:7.3.4
gcr.io/google_containers/metrics-server-amd64:v0.2.01
directxman12/k8s-prometheus-adapter:v0.8.2
在测试的几个node上把这些离线镜像包都导入 docker load -i xxx.tar
3. 开始创建所有服务--node01
cd /mnt/kube-prometheus-master
kubectl create -f manifests/setup
kubectl create -f manifests/
过一会查看创建结果:
kubectl -n monitoring get all
确认每一个都正常
# 附:清空上面部署的prometheus所有服务:
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
2.访问Prometheus的界面
生产小技巧,强制删除删不掉的pods
kubectl delete pod node-exporter-nglzp --grace-period=0 --force -n monitoring
2.1修改下prometheus UI的service模式,便于我们外网访问
kubectl -n monitoring patch svc prometheus-k8s -p '{"spec":{"type":"NodePort"}}'
2.2查看svc和端口、通过浏览器访问外网端口
kubectl -n monitoring get svc prometheus-k8s
2.3点击上方菜单栏Status — Targets ,我们发现kube-controller-manager和kube-scheduler未发现
opreator部署都会有这个问题
注:只要不全不是【::】就得改、如果发现下面不是监控的127.0.0.1,并且通过下面地址可以获取metric指标输出,那么这个改IP这一步可以不用操作
curl 10.0.1.201:10251/metrics curl 10.0.1.201:10252/metrics
[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-NYtX6xSb-1661331702304) (C:\Users\123\AppData\Roaming\Typora\typora-user-images\ image-20220814235414912.png)]
3.解决kube-controller-manager和kube-scheduler未发现的问题
查看两服务监听的IP
ss -tlnp|egrep 'controller|schedule'
4.有几个master就改几个、如果大家前面是按我设计的4台NODE节点,其中2台作master的话,那就在这2台master上把systemcd配置改一下
所有master分别改ip
sed -ri 's+172.24.75.117+0.0.0.0+g' /etc/systemd/system/kube-controller-manager.service
sed -ri 's+172.24.75.117+0.0.0.0+g' /etc/systemd/system/kube-scheduler.service
systemctl daemon-reload
systemctl restart kube-controller-manager.service
systemctl restart kube-scheduler.service
5.xiu该完毕再次尝试访问\\OK 通了、、但是还没完全通、看第6步
curl 172.24.75.119:10251/metrics
curl 172.24.75.119:10252/metrics
[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-C96YoGZN-1661331702305) (C:\Users\123\AppData\Roaming\Typora\typora-user-images\ image-20220814235830159.png)]
6.6.然后因为K8s的这两上核心组件我们是以二进制形式部署的,为了能让K8s上的prometheus能发现,我们还需要来创建相应的service和endpoints来将其关联起来
注意:我们需要将endpoints里面的NODE IP换成我们实际情况的
将上面的yaml配置保存为repair-prometheus.yaml,然后创建它---node01
vi repair-prometheus.yaml
kubectl apply -f repair-prometheus.yaml
创建完确认下
kubectl -n kube-system get svc |egrep 'controller|scheduler'
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
k8s-app: kube-controller-manager
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-controller-manager
name: kube-controller-manager
namespace: kube-system
subsets:
- addresses:
- ip: 172.24.75.119
- ip: 172.24.75.118
- ip: 172.24.75.117
ports:
- name: http-metrics
port: 10252
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
k8s-app: kube-scheduler
spec:
type: ClusterIP
clusterIP: None
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
labels:
k8s-app: kube-scheduler
name: kube-scheduler
namespace: kube-system
subsets:
- addresses:
- ip: 172.24.75.119
- ip: 172.24.75.118
- ip: 172.24.75.117
ports:
- name: http-metrics
port: 10251
protocol: TCP
7.记得还要修改一个地方
kubectl -n monitoring edit servicemonitors.monitoring.coreos.com kube-scheduler
# 将下面两个地方的https换成http
port: https-metrics
scheme: https
kubectl -n monitoring edit servicemonitors.monitoring.coreos.com kube-controller-manager
# 将下面两个地方的https换成http
port: https-metrics
scheme: https
8.然后再返回prometheus UI处,耐心等待几分钟,就能看到已经被发现了、、、完美解决
kubectl get svc -n monitoring
http://101.133.227.28:30838/targets
Successful screenshot:
[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-HqQZnCji-1661331702306) (C:\Users\123\AppData\Roaming\Typora\typora-user-images\ image-20220815002453048.png)]
The 14th level k8s architect course business Prometheus monitoring practice II
Original 2021-03-29 23:59 Boge loves operation and maintenance
Use prometheus to monitor ingress-nginx
Hello everyone, I am Bo Ge Ai Operation and Maintenance. We have deployed ingress-nginx before, which is the key traffic entry component of all services on the entire K8s, so it is very important to collect its metrics indicators to prometheus for related monitoring, because the previous ingress-nginx service is in the form of daemonset Deployed, and mapped its own port to the host, then I can directly use the pod to run the IP on NODE to see the metrics
curl 10.0.1.201:10254/metrics
1.创建 servicemonitor配置让prometheus能发现ingress-nginx的metrics
# vi servicemonitor.yaml
kubectl apply -f servicemonitor.yaml
kubectl -n ingress-nginx get servicemonitors.monitoring.coreos.com
如果指标没收集上来、看下报错、指标一直没收集上来,看看proemtheus服务的日志,发现报错如下:
kubectl -n monitoring logs prometheus-k8s-0 -c prometheus |grep error
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: ingress-nginx
name: nginx-ingress-scraping
namespace: ingress-nginx
spec:
endpoints:
- interval: 30s
path: /metrics
port: metrics
jobLabel: app
namespaceSelector:
matchNames:
- ingress-nginx
selector:
matchLabels:
app: ingress-nginx
2.指标一直没收集上来,看看proemtheus服务的日志,发现报错如下:一堆error
kubectl -n monitoring logs prometheus-k8s-0 -c prometheus |grep error
3.需要修改prometheus的clusterrole#
###从rules开始复制!!
kubectl edit clusterrole prometheus-k8s
4.再到prometheus UI上看下,发现已经有了
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
####从rules开始复制!!!
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
Use Prometheus to monitor the etcd cluster of binary deployment
As the key service ETCD for all resource storage of K8s, we also need to monitor it. Just take this opportunity to fully demonstrate the steps of using Prometheus to monitor non-K8s cluster services
When deploying the K8s cluster earlier, we deployed the ETCD cluster in a binary manner, and used self-signed certificates to configure and access ETCD. As mentioned earlier, the key services now basically have metrics interfaces to support prometheus monitoring. Using the following command, we can see which monitoring indicators are exposed by ETCD
1.查看ETCD是默认暴露---node1
curl --cacert /etc/kubernetes/ssl/ca.pem --cert /etc/kubeasz/clusters/testbobo/ssl/etcd.pem --key /etc/kubeasz/clusters/testbobo/ssl/etcd-key.pem https://172.24.75.117:2379/metrics
1
##改版之后的位置在这:
ll /etc/kubeasz/clusters/testbobo/ssl/etcd.pem
2.进行配置使ETCD能被prometheus发现并监控
2.1首先把ETCD的证书创建为secret----记得修改为自己的集群名字
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubeasz/clusters/test/ssl/etcd.pem --from-file=//etc/kubeasz/clusters/test/ssl/etcd-key.pem --from-file=/etc/kubernetes/ssl/ca.pem
2.2 接着在prometheus里面引用这个secrets
kubectl -n monitoring edit prometheus k8s
spec:
##加中间两个在最后,version:v2.22.1后面
...
secrets:
- etcd-certs
2.3保存退出后,prometheus会自动重启服务pod以加载这个secret配置,过一会,我们进pod来查看下是不是已经加载到ETCD的证书了
kubectl -n monitoring exec -it prometheus-k8s-0 -c prometheus -- sh
###修改配置自动重启,所以多进去几次,这里没问题
/prometheus $ ls /etc/prometheus/secrets/etcd-certs/
ca.pem etcd-key.pem etcd.pem
3.接下来准备创建service、endpoints以及ServiceMonitor的yaml配置
注意替换下面的NODE节点IP为实际ETCD所在NODE内网IP
vi prometheus-etcd.yaml
kubectl apply -f prometheus-etcd.yaml
4.过一会,就可以在prometheus UI上面看到ETCD集群被监控了
apiVersion: v1
kind: Service
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd
spec:
type: ClusterIP
clusterIP: None
ports:
- name: api
port: 2379
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd
subsets:
- addresses:
- ip: 172.24.75.119
- ip: 172.24.75.118
- ip: 172.24.75.117
ports:
- name: api
port: 2379
protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd-k8s
spec:
jobLabel: k8s-app
endpoints:
- port: api
interval: 30s
scheme: https
tlsConfig:
caFile: /etc/prometheus/secrets/etcd-certs/ca.pem
certFile: /etc/prometheus/secrets/etcd-certs/etcd.pem
keyFile: /etc/prometheus/secrets/etcd-certs/etcd-key.pem
#use insecureSkipVerify only if you cannot use a Subject Alternative Name
insecureSkipVerify: true
selector:
matchLabels:
k8s-app: etcd
namespaceSelector:
matchNames:
- monitoring
[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-bef5UZXC-1661331702307) (C:\Users\123\AppData\Roaming\Typora\typora-user-images\ image-20220815005649254.png)]
5.接下来我们用grafana来展示被监控的ETCD指标接下来我们用grafana来展示被监控的ETCD指标
5.1. 在grafana官网模板中心搜索etcd,下载这个json格式的模板文件
https://grafana.com/dashboards/3070
5.2.然后打开自己先部署的grafana首页,
kubectl get svc -n monitoring |grep gra
##初始密码admin
点击左边菜单栏四个小正方形方块HOME --- Manage
再点击右边 Import dashboard ---
点击Upload .json File 按钮,上传上面下载好的json文件 etcd_rev3.json,
然后在prometheus选择数据来源
点击Import,即可显示etcd集群的图形监控信息
Successful screenshot:
The 14th level k8s architect course `course business Prometheus monitoring actual combat three
Original 2021-03-31 21:11 Boge loves operation and maintenance
prometheus monitoring data and grafana configuration persistent storage configuration
Hello everyone, I am Bo Ge Ai Operation and Maintenance. This practical lesson will explain how to configure the data persistence of prometheus and grafana.
prometheus data persistence configuration
1.注意这下面的statefulset服务就是我们需要做数据持久化的地方
kubectl -n monitoring get statefulset,pod|grep prometheus-k8s
2.看下我们之前准备的StorageClass动态存储
kubectl get sc
root@node01 mnt]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-boge nfs-provisioner-01 Retain Immediate false 11h
3.准备prometheus持久化的pvc配置---加到version最后
kubectl -n monitoring edit prometheus k8s
spec:
......
#从storage开始加
storage:
volumeClaimTemplate:
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "nfs-boge"
#下面三行去掉不然新版本报错
#selector:
# matchLabels:
# app: my-example-prometheus
resources:
requests:
storage: 1Gi
4.# 上面修改保存退出后,过一会我们查看下pvc创建情况,以及pod内的数据挂载情况
kubectl -n monitoring get pvc
[root@node01 mnt]# kubectl -n monitoring get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-71e2655a-a1dc-487d-8cfe-f29d02d3035e 1Gi RWO nfs-boge 101s
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-4326dac9-385b-4aa8-91b0-60a77b79e625 1Gi RWO nfs-boge 101s
5.# kubectl -n monitoring exec -it prometheus-k8s-0 -c prometheus -- sh
/prometheus $ df -Th
......
10.0.1.201:/nfs_dir/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-055e6b11-31b7-4503-ba2b-4f292ba7bd06/prometheus-db
nfs4 97.7G 9.4G 88.2G 10% /prometheus
[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-PXa6Shjz-1661331702309) (C:\Users\123\AppData\Roaming\Typora\typora-user-images\ image-20220815011507763.png)]
Grafana configures persistent storage configuration
1.保存pvc为grafana-pvc.yaml
vi grafana-pvc.yaml
#vi grafana-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: grafana
namespace: monitoring
spec:
storageClassName: nfs-boge
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
# 开始创建pvc
kubectl apply -f grafana-pvc.yaml
# 看下创建的pvc
kubectl -n monitoring get pvc
# 编辑grafana的deployment资源配置
kubectl -n monitoring edit deployments.apps grafana
# 旧的配置
volumes:
- emptyDir: {
}
name: grafana-storage
# 替换成新的配置---131行
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana
# 同时加入下面的env环境变量,将登陆密码进行固定修改
spec:
containers:
......
#env和port对齐---插到47行
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin321
# 过一会,等grafana重启完成后,用上面的新密码进行登陆
kubectl -n monitoring get pod -w|grep grafana
grafana-5698bf94f4-prbr2 0/1 Running 0 3s
grafana-5698bf94f4-prbr2 1/1 Running 0 4s
# 因为先前的数据并未持久化,所以会发现先导入的ETCD模板已消失,这时重新再导入一次,后面重启也不会丢了
The 14th level k8s architect course business Prometheus monitoring practice four
Original 2021-04-01 20:35 Boge loves operation and maintenance
prometheus sends alarm
Hello everyone, I am Boge Ai Operation and Maintenance. So currently, webhooks are more commonly used in production to forward alarm content to chat tools used in enterprises, such as DingTalk, Enterprise WeChat, Feishu, etc.
The alarm component of prometheus is Alertmanager, which supports custom webhooks to accept the alarms it sends out. The log json fields it sends out have many fields. We need to clean and forward the corresponding logs according to the app that needs to be received.
Here Boge will use golang combined with the Gin network framework to write a log cleaning and forwarding tool, and give detailed explanations and actual combat of these commonly used alarm methods
Download boge-webhook.zip
https://cloud.189.cn/t/B3EFZvnuMvuu (access code: h1wx)
First look at what the alarm rules and alarm sending configuration look like
The rules of prometheus-operator are very complete, and basically belong to the out-of-the-box type. You can make targeted adjustments to the rules and alarm rules in it according to the alarms you receive daily, such as shortening the alarm observation time, etc.
1.第一步上传,上传完成后1.1看看就行,直接看第二步---node1
上传webhook到/mnt
ll /mnt/boge-webhook
1.2注意事先在配置文件 alertmanager.yaml 里面编辑好收件人等信息 ,再执行下面的命令
注意配置收件人的命名空间
##钉钉、飞书、微信
receivers:
- name: 'webhook'
webhook_configs:
# namespace(如果在default就删掉后面的)
# 修改url后要当重新删除alert pods
- url: 'http://alertmanaer-dingtalk-svc.kube-system//b01bdc063/boge/getjson'
# svc.default.svc
send_resolved: true
##看一下log有没有报错##看一下log有没有报错
kubectl logs alertmanager-main-0 -c alertmanager -n monitoring
kubectl logs alertmanaer-dingtalk-dp-84768f8464-vnpg8 -n monitoring
[GIN-debug] GET /status --> mycli/libs.MyWebServer.func1 (3 handlers)
[GIN-debug] POST /b01bdc063/boge/getjson --> mycli/libs.MyWebServer.func2 (3 handlers)
[GIN-debug] POST /7332f19/prometheus/dingtalk --> mycli/libs.MyWebServer.func3 (3 handlers)
[GIN-debug] POST /1bdc0637/prometheus/feishu --> mycli/libs.MyWebServer.func4 (3 handlers)
[GIN-debug] POST /5e00fc1a/prometheus/weixin --> mycli/libs.MyWebServer.func5 (3 handlers
1.1监控报警规划修改文件位置(看一下,现在不用真的改)
find / -name prometheus-rules.yaml
/mnt/kube-prometheus-master/manifests/prometheus-rules.yaml
修改完成记得更新 kubectl apply -f ./manifests/prometheus/prometheus-rules.yaml
# 通过这里可以获取需要创建的报警配置secret名称
# kubectl -n monitoring edit statefulsets.apps alertmanager-main
...
volumes:
- name: config-volume
secret:
defaultMode: 420
secretName: alertmanager-main
...
###
2.1删除原来的---node1-----每更新一次alertmanager.yaml,就要重新执行2.1----2.1、、或者重启pods
kubectl delete secrets alertmanager-main -n monitoring
2.2执行新的---在webhook文件夹
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
secret/alertmanager-main created
##看一下log有没有报错##看一下log有没有报错
kubectl logs alertmanager-main-0 -c alertmanager -n monitoring
3.构建根据DockerFile镜像----记得改仓库名字、
docker build -t harbor.boge.com/product/alertmanaer-webhook:1.0 .
4.推送镜像
4.1先登录
docker login
4.2去服务器创建哥私有镜像仓库
4.3如果这里报错,去node04看看nginx的端口是不是被haproxy占用了ss -luntp|grep 443
如果被占用systemctl stop haproxy 然后在node04 cd /data/harbor ./install.sh重装harbor不会覆盖原来的数据-----------解决问题的感觉太爽了然后就可以访问了---提示查看svc
47.103.13.111/harbor/
回到node01
cd /mnt/boge-webhook
docker push harbor.boge.com/product/alertmanaer-webhook:1.0
5.vim alertmanaer-webhook.yaml-----也可以配置微信钉钉token
如果要拉私有仓库先在imagePullSecret建立密钥,在他的上一行有指定命令
配置钉钉地址/微信信息token
报警过滤把
imagePullSecrets:
- name: boge-secret
放到最后
6.所有节点做域名解析
echo "172.24.75.115 harbor.boge.com" >> /etc/hosts
7.创建secret命令格式
kubectl create secret docker-registry boge-secret --docker-server=harbor.boge.com --docker-username=admin --docker-password=boge666 --docker-email=[email protected]
kubectl apply -f alertmanaer-webhook.yamls
8.在转发查看日志
kubectl logs alertmanaer-dingtalk-dp-84768f8464-ktzx7
##看一下log有没有报错##看一下log有没有报错
kubectl logs alertmanager-main-0 -c alertmanager -n monitoring
每次修改alertmanger文件,都要重新创建sercert,删除main-pod重新生成
Successful screenshot, no error
[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-JA7gtSL4-1661331702310) (C:\Users\qingtian\AppData\Roaming\Typora\typora-user-images\ image-20220815163618860.png)]
Attachment: Monitoring prometheus rule configuration of other services
https://github.com/samber/awesome-prometheus-alerts
https://github.com/samber/awesome-prometheus-alerts
The 15th level k8s architect course CICD automation based on gitlab
Original 2021-04-06 20:17 Boge loves operation and maintenance
1. Binary installation K8S
2. K8S increase node, reduce node, upgrade K8S version
./ezctl del-node test nodeip
./ezctl add-node test nodeip
3. Install ingress-nginx-controller
4. Install harbor–pg—gitlib—runner—dind
5. Prepare the production project py go
6. Design automated CI/CDl process
7. Start implementing CICD automation
The 15th level k8s architect course CICD automation based on gitlab II
Original 2021-04-06 20:17 Boge loves operation and maintenance
Upgrade version, add and delete nodes, version upgrade https://www.toutiao.com/video/6947312140566921736/?from_scene=all&log_from=5a724f0a3dcb2_1660324551795 [External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image Upload directly (img-hSZ6nbZt-1661331702311)(C:\Users\123\AppData\Roaming\Typora\typora-user-images\image-20220813084537187.png)]
[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-EB0CC4V9-1661331702312) (C:\Users\123\AppData\Roaming\Typora\typora-user-images\ image-20220813084735944.png)]
Hello everyone, I am Bo Ge Ai Operation and Maintenance. In this lesson, we will first deploy the database postgresql and redis required by gitlab's private code warehouse.
It should be noted that if the address and mounting directory of your nfs-server are not defined according to the previous course of Boge, then you need to remember to replace them in the following yaml configuration.
Deployment Deployment infrastructure postgresql
# ------------------------------------------------
# mkdir -p /nfs_dir/{gitlab_etc_ver130806,gitlab_log_ver130806,gitlab_opt_ver130806,gitlab_postgresql_data_ver130806}
# kubectl create namespace gitlab-ver130806
# kubectl -n gitlab-ver130806 apply -f 3postgres.yaml
# kubectl -n gitlab-ver130806 apply -f 4redis.yaml
# kubectl -n gitlab-ver130806 apply -f 5gitlab.yaml
# kubectl -n gitlab-ver130806 apply -f 6gitlab-tls.yaml
# ------------------------------------------------
# pv----持久卷目录名字想自己管理就手动建PV----node01----nfs目录
mkdir -p /nfs_dir/{
gitlab_etc_ver130806,gitlab_log_ver130806,gitlab_opt_ver130806,gitlab_postgresql_data_ver130806}
1.kubectl create namespace gitlab-ver130806
2.kubectl -n gitlab-ver130806 apply -f 3postgres.yaml
3.进入node4
cd /data/harbor/
查看docker-compose状态:
docker-compose ps
如果状态有exit
4.重启docker-compose
docker-compose down -v
docker-compose up -d
# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gitlab-postgresql-data-ver130806
labels:
type: gitlab-postgresql-data-ver130806
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
nfs:
path: /nfs_dir/gitlab_postgresql_data_ver130806
server: 10.0.1.201
# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: gitlab-postgresql-data-ver130806-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: nfs
selector:
matchLabels:
type: gitlab-postgresql-data-ver130806
---
apiVersion: v1
kind: Service
metadata:
name: postgresql
labels:
app: gitlab
tier: postgreSQL
spec:
ports:
- port: 5432
selector:
app: gitlab
tier: postgreSQL
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgresql
labels:
app: gitlab
tier: postgreSQL
spec:
replicas: 1
selector:
matchLabels:
app: gitlab
tier: postgreSQL
strategy:
type: Recreate
template:
metadata:
labels:
app: gitlab
tier: postgreSQL
spec:
#nodeSelector:
# gee/disk: "500g"
containers:
- image: postgres:12.6-alpine
#- image: harbor.boge.com/library/postgres:12.6-alpine
name: postgresql
env:
- name: POSTGRES_USER
value: gitlab
- name: POSTGRES_DB
value: gitlabhq_production
- name: POSTGRES_PASSWORD
value: bogeusepg
- name: TZ
value: Asia/Shanghai
ports:
- containerPort: 5432
name: postgresql
livenessProbe:
exec:
command:
- sh
- -c
- exec pg_isready -U gitlab -h 127.0.0.1 -p 5432 -d gitlabhq_production
initialDelaySeconds: 110
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
exec:
command:
- sh
- -c
- exec pg_isready -U gitlab -h 127.0.0.1 -p 5432 -d gitlabhq_production
initialDelaySeconds: 20
timeoutSeconds: 3
periodSeconds: 5
# resources:
# requests:
# cpu: 100m
# memory: 512Mi
# limits:
# cpu: "1"
# memory: 1Gi
volumeMounts:
- name: postgresql
mountPath: /var/lib/postgresql/data
volumes:
- name: postgresql
persistentVolumeClaim:
claimName: gitlab-postgresql-data-ver130806-pvc
Basic configuration deployment redis
1.----节点node01
vi 4redis.yaml
2.kubectl -n gitlab-ver130806 apply -f 4redis.yaml
---
apiVersion: v1
kind: Service
metadata:
name: redis
labels:
app: gitlab
tier: backend
spec:
ports:
- port: 6379
targetPort: 6379
selector:
app: gitlab
tier: backend
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
labels:
app: gitlab
tier: backend
spec:
replicas: 1
selector:
matchLabels:
app: gitlab
tier: backend
strategy:
type: Recreate
template:
metadata:
labels:
app: gitlab
tier: backend
spec:
#nodeSelector:
# gee/disk: "500g"
containers:
- image: redis:6.2.0-alpine3.13
#- image: harbor.boge.com/library/redis:6.2.0-alpine3.13
name: redis
command:
- "redis-server"
args:
- "--requirepass"
- "bogeuseredis"
# resources:
# requests:
# cpu: "1"
# memory: 2Gi
# limits:
# cpu: "1"
# memory: 2Gi
ports:
- containerPort: 6379
name: redis
livenessProbe:
exec:
command:
- sh
- -c
- "redis-cli ping"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
readinessProbe:
exec:
command:
- sh
- -c
- "redis-cli ping"
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
initContainers:
- command:
- /bin/sh
- -c
- |
ulimit -n 65536
mount -o remount rw /sys
echo never > /sys/kernel/mm/transparent_hugepage/enabled
mount -o remount rw /proc/sys
echo 2000 > /proc/sys/net/core/somaxconn
echo 1 > /proc/sys/vm/overcommit_memory
image: registry.cn-beijing.aliyuncs.com/acs/busybox:v1.29.2
imagePullPolicy: IfNotPresent
name: init-redis
resources: {
}
securityContext:
privileged: true
procMount: Default
The 15th level k8s architect course CICD automation based on gitlab three
Original 2021-04-07 21:36 Boge loves operation and maintenance
Hello everyone, I am Bo Ge Ai Operation and Maintenance. In this lesson, let's start deploying the gitlab service.
Configure and deploy gitlab main service
1.自己定制镜像----在生产中直接用---------先修改suource.list文件
如果博哥跑的dockerfile一直报错gpnp下载不了,就用这个自己修改的
FROM gitlab/gitlab-ce:13.8.6-ce.0
RUN rm /etc/apt/sources.list \
&& echo 'deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main' > /etc/apt/sources.list.d/pgdg.list
RUN wget https://www.postgresql.org/media/keys/ACCC4CF8.asc --no-check-certificate
RUN apt-key add ACCC4CF8.asc
COPY sources.list /etc/apt/sources.list
RUN apt-get update -yq && \
apt-get install -y vim iproute2 net-tools iputils-ping curl wget software-properties-common unzip postgresql-client-12 && \
rm -rf /var/cache/apt/archives/*
RUN ln -svf /usr/bin/pg_dump /opt/gitlab/embedded/bin/pg_dump
2.推送打包好的镜像到私有仓库harbor--------------格式一定要对不然永远传不上去
docker tag gitlab/gitlab-ce:13.8.6-ce.1 harbor.boge.com/library/gitlab-ce:13.8.6-ce.1
下面是博哥给的dockfile#Dockerfile
# 打包命令docker build -t gitlab/gitlab-ce:13.8.6-ce.1 .
FROM gitlab/gitlab-ce:13.8.6-ce.0
RUN rm /etc/apt/sources.list \
&& echo 'deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main' > /etc/apt/sources.list.d/pgdg.list \
&& wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
COPY sources.list /etc/apt/sources.list
RUN apt-get update -yq && \
apt-get install -y vim iproute2 net-tools iputils-ping curl wget software-properties-common unzip postgresql-client-12 && \
rm -rf /var/cache/apt/archives/*
RUN ln -svf /usr/bin/pg_dump /opt/gitlab/embedded/bin/pg_dump
#---------------------------------------------------------------
# docker build -t gitlab/gitlab-ce:13.8.6-ce.1 .
2.配置阿里云加速
保存为sources.list文件、放到dockerfile文件同级别目录
deb http://mirrors.aliyun.com/ubuntu/ xenial main
deb-src http://mirrors.aliyun.com/ubuntu/ xenial main
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates main
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main
deb http://mirrors.aliyun.com/ubuntu/ xenial universe
deb-src http://mirrors.aliyun.com/ubuntu/ xenial universe
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb http://mirrors.aliyun.com/ubuntu xenial-security main
deb-src http://mirrors.aliyun.com/ubuntu xenial-security main
deb http://mirrors.aliyun.com/ubuntu xenial-security universe
deb-src http://mirrors.aliyun.com/ubuntu xenial-security universe
#docker build -t gitlab/gitlab-ce:13.8.6-ce.1 .
3.开始部署-----最佳资源配置-----生产积累
# restore gitlab data command example:
1.gitlab数据库全备份命令
kubectl -n gitlab-ver130806 exec -it $(kubectl -n gitlab-ver130806 get pod|grep -v runner|grep gitlab|awk '{
print $1}') -- gitlab-rake gitlab:backup:restore BACKUP=1602889879_2020_10_17_12.9.2
2.重新加载配置
kubectl -n gitlab-ver130806 exec -it $(kubectl -n gitlab-ver130806 get pod|grep -v runner|grep gitlab|awk '{
print $1}') -- gitlab-ctl reconfigure
查看组件状态
3.kubectl -n gitlab-ver130806 exec -it $(kubectl -n gitlab-ver130806 get pod|grep -v runner|grep gitlab|awk '{
print $1}') -- gitlab-ctl status
4.kubectl -n gitlab-ver130806 apply -f 5gitlab.yaml
# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gitlab-etc-ver130806
labels:
type: gitlab-etc-ver130806
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
nfs:
path: /nfs_dir/gitlab_etc_ver130806
server: 172.24.75.119
# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: gitlab-etc-ver130806-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs
selector:
matchLabels:
type: gitlab-etc-ver130806
# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gitlab-log-ver130806
labels:
type: gitlab-log-ver130806
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
nfs:
path: /nfs_dir/gitlab_log_ver130806
server: 172.24.75.119
# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: gitlab-log-ver130806-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs
selector:
matchLabels:
type: gitlab-log-ver130806
# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gitlab-opt-ver130806
labels:
type: gitlab-opt-ver130806
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
nfs:
path: /nfs_dir/gitlab_opt_ver130806
server: 172.24.75.119
# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: gitlab-opt-ver130806-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: nfs
selector:
matchLabels:
type: gitlab-opt-ver130806
---
apiVersion: v1
kind: Service
metadata:
name: gitlab
labels:
app: gitlab
tier: frontend
spec:
ports:
- name: gitlab-ui
port: 80
protocol: TCP
targetPort: 80
- name: gitlab-ssh
port: 22
protocol: TCP
targetPort: 22
selector:
app: gitlab
tier: frontend
type: NodePort
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gitlab-cb-ver130806
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: gitlab
namespace: gitlab-ver130806
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitlab
labels:
app: gitlab
tier: frontend
spec:
replicas: 1
selector:
matchLabels:
app: gitlab
tier: frontend
strategy:
type: Recreate
template:
metadata:
labels:
app: gitlab
tier: frontend
spec:
serviceAccountName: gitlab
containers:
#私有仓库地址tag格式别错了!!
#前面Dockerfile的打包推送image是这样
#docker tag gitlab/gitlab-ce:13.8.6-ce.1 harbor.boge.com/library/gitlab-ce:13.8.6-ce.1
- image: harbor.boge.com/library/gitlab-ce:13.8.6-ce.1
#不成功就换成公有的比如 15927561940/gitlab-ce:13.8.6-ce.1
#自己推到github或者dockerhub
name: gitlab
# resources:
# requests:
# cpu: 400m
# memory: 4Gi
# limits:
# cpu: "800m"
# memory: 8Gi
securityContext:
privileged: true
env:
- name: TZ
value: Asia/Shanghai
- name: GITLAB_OMNIBUS_CONFIG
value: |
postgresql['enable'] = false
gitlab_rails['db_username'] = "gitlab"
gitlab_rails['db_password'] = "bogeusepg"
gitlab_rails['db_host'] = "postgresql"
gitlab_rails['db_port'] = "5432"
gitlab_rails['db_database'] = "gitlabhq_production"
gitlab_rails['db_adapter'] = 'postgresql'
gitlab_rails['db_encoding'] = 'utf8'
redis['enable'] = false
gitlab_rails['redis_host'] = 'redis'
gitlab_rails['redis_port'] = '6379'
gitlab_rails['redis_password'] = 'bogeuseredis'
gitlab_rails['gitlab_shell_ssh_port'] = 22
external_url 'http://git.boge.com/'
nginx['listen_port'] = 80
nginx['listen_https'] = false
#-------------------------------------------
gitlab_rails['gitlab_email_enabled'] = true
gitlab_rails['gitlab_email_from'] = '[email protected]'
gitlab_rails['gitlab_email_display_name'] = 'boge'
gitlab_rails['gitlab_email_reply_to'] = '[email protected]'
gitlab_rails['gitlab_default_can_create_group'] = true
gitlab_rails['gitlab_username_changing_enabled'] = true
gitlab_rails['smtp_enable'] = true
gitlab_rails['smtp_address'] = "smtp.exmail.qq.com"
gitlab_rails['smtp_port'] = 465
gitlab_rails['smtp_user_name'] = "[email protected]"
gitlab_rails['smtp_password'] = "bogesendmail"
gitlab_rails['smtp_domain'] = "exmail.qq.com"
gitlab_rails['smtp_authentication'] = "login"
gitlab_rails['smtp_enable_starttls_auto'] = true
gitlab_rails['smtp_tls'] = true
#-------------------------------------------
# 关闭 promethues
prometheus['enable'] = false
# 关闭 grafana
grafana['enable'] = false
# 减少内存占用
unicorn['worker_memory_limit_min'] = "200 * 1 << 20"
unicorn['worker_memory_limit_max'] = "300 * 1 << 20"
# 减少 sidekiq 的并发数
sidekiq['concurrency'] = 16
# 减少 postgresql 数据库缓存
postgresql['shared_buffers'] = "256MB"
# 减少 postgresql 数据库并发数量
postgresql['max_connections'] = 8
# 减少进程数 worker=CPU核数+1
unicorn['worker_processes'] = 2
nginx['worker_processes'] = 2
puma['worker_processes'] = 2
# puma['per_worker_max_memory_mb'] = 850
# 保留3天备份的数据文件
gitlab_rails['backup_keep_time'] = 259200
#-------------------------------------------
ports:
- containerPort: 80
name: gitlab
livenessProbe:
exec:
command:
- sh
- -c
- "curl -s http://127.0.0.1/-/health|grep -w 'GitLab OK'"
initialDelaySeconds: 120
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
readinessProbe:
exec:
command:
- sh
- -c
- "curl -s http://127.0.0.1/-/health|grep -w 'GitLab OK'"
initialDelaySeconds: 120
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
volumeMounts:
- mountPath: /etc/gitlab
name: gitlab1
- mountPath: /var/log/gitlab
name: gitlab2
- mountPath: /var/opt/gitlab
name: gitlab3
- mountPath: /etc/localtime
name: tz-config
volumes:
- name: gitlab1
persistentVolumeClaim:
claimName: gitlab-etc-ver130806-pvc
- name: gitlab2
persistentVolumeClaim:
claimName: gitlab-log-ver130806-pvc
- name: gitlab3
persistentVolumeClaim:
claimName: gitlab-opt-ver130806-pvc
- name: tz-config
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
securityContext:
runAsUser: 0
fsGroup: 0
#4.kubectl -n gitlab-ver130806 apply -f 5gitlab.yaml
5.配置好后确定pods都跑起来了-----注意namespace
6.kubectl get svc -n gitlab-ver130806
通过svc外网访问
CICD automation based on gitlab of the 15th k8s architect course
Original 2021-04-08 21:20 Boge loves operation and maintenance
Hello everyone, I am Bo Ge Ai Operation and Maintenance. In this lesson, we will talk about the runner in gitlab, and the CI/CD automation of gitlab, all of which are issued by gitlab and executed by the runner component. We also run the runner on k8s here.
Runner literally means a runner. Its role in the entire automation process is also equivalent to a takeaway brother. It receives the automation instructions issued by gitlab to perform corresponding operations, so as to achieve the effect of the entire CI/CD .
Automated instructions to do corresponding operations, so as to achieve the effect of the entire CI/CD.
deploy gitlab-tls
1.先创建证书:---node01---生产是买域名
openssl genrsa -out tls.key 2048
openssl req -new -x509 -key tls.key -out tls.cert -days 360 -subj /CN=*.boge.com
kubectl -n gitlab-ver130806 create secret tls mytls --cert=tls.cert --key=tls.key
[root@node01 ~]# kubectl -n gitlab-ver130806 create secret tls mytls --cert=tls.cert --key=tls.key
secret/mytls created
2.开始部署ingress:
vi 6gitlab-tls.yaml
kubectl -n gitlab-ver130806 apply -f 6gitlab-tls.yaml
## https://kubernetes.io/docs/concepts/services-networking/ingress/
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: gitlab
annotations:
#是否强制跳转到https 308重定向----上传文件大小
nginx.ingress.kubernetes.io/force-ssl-redirect: "false"
nginx.ingress.kubernetes.io/proxy-body-size: "20m"
spec:
tls:
- hosts:
- git.boge.com
secretName: mytls
rules:
#这里就是ingress
- host: git.boge.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: gitlab
port:
number: 80
---
3。自定义域名绑定本地hosts----所有节点
echo "172.24.75.145 git.boge.com" >> /etc/hosts
Automated instructions to do corresponding operations, so as to achieve the effect of the entire CI/CD.
deploy gitlab-runner
1.docker
1.mkdir -p /nfs_dir/{
gitlab-runner1-ver130806-docker,gitlab-runner2-ver130806-share}
2. vi 7gitlab-runner.yaml
3.kubectl apply -f 7gitlab-runner.yaml -n gitlab-ver130806
kubectl get pods -n gitlab-ver130806
4.kubectl exec -it gitlab-runner1-ver130806-docker-7596bf755c-ttvhj -n gitlab-ver130806 -- bash
2.# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gitlab-runner1-ver130806-docker
labels:
type: gitlab-runner1-ver130806-docker
spec:
capacity:
storage: 0.1Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
nfs:
path: /nfs_dir/gitlab-runner1-ver130806-docker
server: 172.24.75.117
# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: gitlab-runner1-ver130806-docker
namespace: gitlab-ver130806
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 0.1Gi
storageClassName: nfs
selector:
matchLabels:
type: gitlab-runner1-ver130806-docker
---
# https://docs.gitlab.com/runner/executors
#可接受多少个
#concurrent = 30
#check_interval = 0
#[session_server]
# session_timeout = 1800
#[[runners]]
# name = "gitlab-runner1-ver130806-docker"
# url = "http://git.boge.com"
# token = "xxxxxxxxxxxxxxxxxxxxxx"
# executor = "kubernetes"
# [runners.kubernetes]
# namespace = "gitlab-ver130806"
# image = "docker:stable"
# helper_image = "gitlab/gitlab-runner-helper:x86_64-9fc34d48-pwsh"
# privileged = true
# [[runners.kubernetes.volumes.pvc]]
# name = "gitlab-runner1-ver130806-docker"
# mount_path = "/mnt"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitlab-runner1-ver130806-docker
namespace: gitlab-ver130806
spec:
replicas: 1
selector:
matchLabels:
name: gitlab-runner1-ver130806-docker
template:
metadata:
labels:
name: gitlab-runner1-ver130806-docker
spec:
hostAliases:
#内部地址,填gitlab的svc地址
#在这找kubectl get svc -n gitlab-ver130806 ,下面share还有个地方也要改成这个
- ip: "10.68.140.109"
hostnames:
- "git.boge.com"
serviceAccountName: gitlab
containers:
- args:
- run
image: gitlab/gitlab-runner:v13.10.0
name: gitlab-runner1-ver130806-docker
volumeMounts:
- mountPath: /etc/gitlab-runner
name: config
- mountPath: /etc/ssl/certs
name: cacerts
readOnly: true
restartPolicy: Always
volumes:
- persistentVolumeClaim:
claimName: gitlab-runner1-ver130806-docker
name: config
- hostPath:
path: /usr/share/ca-certificates/mozilla
name: cacerts
333.把runner注册到gitlib
# gitlab-ci-multi-runner register
#进入runner里面去然后注册
1.kubectl exec -it gitlab-runner1-ver130806-docker-7596bf755c-ttvhj -n gitlab-ver130806 -- bash
2.容器内#输入命令
gitlab-ci-multi-runner register
复制仓库地址
设置,扳手、管理员、runner\token
按照下面格式修改以后,删除pods让他自动重启
kubectl delete gitlab-runner1-ver130806-docker-7596bf755c-ttvhj -n gitlab-ver130806
3.配置好后进入gitlab网页刷新runner\\\发现加载进去了
容器内部也可以看cat /etc/gitlab-runner/config.toml
4.新开窗口、找到gitlab的svc,外网访问
kubectl get svc -n gitlab-ver130806
5.容器内编辑配置发现没命令
去容器外挂载目录编辑---看到已经被挂载
cd /nfs_dir/gitlab-runner1-ver130806-docker
vim config.toml
格式:
# https://docs.gitlab.com/runner/executors
#可接受多少个
#concurrent = 30
#check_interval = 0
#格式:
#[session_server]
# session_timeout = 1800
#格式:
#[[runners]]
# name = "gitlab-runner1-ver130806-docker"
# url = "http://git.boge.com"
# token = "xxxxxxxxxxxxxxxxxxxxxx"
# executor = "kubernetes"
# [runners.kubernetes]
namespace = "gitlab-ver130806" #补到左边
image = "docker:stable"
helper_image = "gitlab/gitlab-runner-helper:x86_64-9fc34d48-pwsh"
privileged = true
6.改完以后重启pod--runner
kubectl delete pods gitlab-runner1-ver130806-docker-665bddc4d4-8bdtz -n gitlab-ver130806
7.看一下日志:
kubectl logs gitlab-runner1-ver130806-docker-665bddc4d4-jdglv -n gitlab-ver130806
没报错就行
8.回到gitlab网页点击编辑一下、只勾选第一个、保存、runner变成shared可用绿色
[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-TfETNqHI-1661331702313) (C:\Users\123\AppData\Roaming\Typora\typora-user-images\ image-20220816163356450.png)]
1.运行runner2=----share共享型share共享型,先执行下面的yaml文件、记得改ip
vi 8runner2.yaml
\\也要进到容器注册
2.kubectl exec -it gitlab-runner2-ver130806-share-bf884b794-gwqs6 -n gitlab-ver130806 -- /bin/bash
2.运行gitlab-ci-multi-runner register
同样进入gitlab,进入runner拿到token、输入标签
3.修改配置:
cd /nfs_dir/gitlab-runner2-ver130806-docker
vim config.toml
也是和runner1一样的改
4.改完之后重启runner2的pods\\重启pods
kubectl delete pods gitlab-runner2-ver130806-share-bf884b794-gwqs6 -n gitlab-ver130806
3.刷线gitlab网页,看到runner2出来
钩1、3
# Active √ Paused Runners don't accept new jobs
# Protected This runner will only run on pipelines triggered on protected branches
# Run untagged jobs √ Indicates whether this runner can pick jobs without tags
# Lock to current projects When a runner is locked, it cannot be assigned to other projects
# pv
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gitlab-runner2-ver130806-share
labels:
type: gitlab-runner2-ver130806-share
spec:
capacity:
storage: 0.1Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs
nfs:
path: /nfs_dir/gitlab-runner2-ver130806-share
server: 172.24.75.117
# pvc
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: gitlab-runner2-ver130806-share
namespace: gitlab-ver130806
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 0.1Gi
storageClassName: nfs
selector:
matchLabels:
type: gitlab-runner2-ver130806-share
---
# https://docs.gitlab.com/runner/executors
#concurrent = 30
#check_interval = 0
#[session_server]
# session_timeout = 1800
#[[runners]]
# name = "gitlab-runner2-ver130806-share"
# url = "http://git.boge.com"
# token = "xxxxxxxxxxxxxxxx"
# executor = "kubernetes"
# [runners.kubernetes]
namespace = "gitlab-ver130806"
image = "registry.cn-beijing.aliyuncs.com/acs/busybox/busybox:v1.29.2"
helper_image = "gitlab/gitlab-runner-helper:x86_64-9fc34d48-pwsh"
privileged = false
[[runners.kubernetes.volumes.pvc]]
name = "gitlab-runner2-v1230-share"
mount_path = "/mnt"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitlab-runner2-ver130806-share
namespace: gitlab-ver130806
spec:
replicas: 1
selector:
matchLabels:
name: gitlab-runner2-ver130806-share
template:
metadata:
labels:
name: gitlab-runner2-ver130806-share
spec:
hostAliases:
#改这里
#内部地址,填gitlab的svc地址
#在这找kubectl get svc -n gitlab-ver130806 ,下面share还有个地方也要改成这个
- ip: "10.68.140.109"
hostnames:
- "git.boge.com"
serviceAccountName: gitlab
containers:
- args:
- run
image: gitlab/gitlab-runner:v13.10.0
name: gitlab-runner2-ver130806-share
volumeMounts:
- mountPath: /etc/gitlab-runner
name: config
- mountPath: /etc/ssl/certs
name: cacerts
readOnly: true
restartPolicy: Always
volumes:
- persistentVolumeClaim:
claimName: gitlab-runner2-ver130806-share
name: config
- hostPath:
path: /usr/share/ca-certificates/mozilla
name: cacerts
CICD automation based on gitlab for the 15th k8s architect course
Original 2021-04-09 22:47 Boge loves operation and maintenance
Hello everyone, I am Bo Ge Ai Operation and Maintenance. In this lesson, we continue to configure gitlab-related services.
Increase the internal analysis of gitlab in k8s
Cloud platform must do
Why do you do this, Boge summed up two reasons here:
- Optimize gitlab network communication. For the runner to call gitlab service, it is faster to go directly to the internal address
- If you are a student using Alibaba Cloud and deploy gitlab on k8s, then k8s internal services such as the runner cannot request access through the public network entrance SLB in front of the same cluster. Here is the reason for Alibaba Cloud’s own network architecture. At this time we You only need to do the following configuration to solve it perfectly
- Add a coreDNS to force resolution and force the internal address to go first
1.增加gitlab在k8s的内部解析
# kubectl -n kube-system get configmaps coredns -o yaml
kubectl -n kube-system edit configmaps coredns
改完后重启pods
kubectl get pods -n kube-system |grep coredns
kubectl delete pods coredns-5787695b7f-l7v2b -n kube-system
进入容器\\\
可ping git.boge.com 通(gitlab service ip)
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
ready
#从log开始加进去
log
rewrite stop {
# 域名 server名称.namespace名称.后缀
name regex git.boge.com gitlab.gitlab-ver130806.svc.cluster.local
answer name gitlab.gitlab-ver130806.svc.cluster.local git.boge.com
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods verified
fallthrough in-addr.arpa ip6.arpa
}
autopath @kubernetes
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
2.增加ssh端口转发----node-4
我们要保持所有开发人员能使用默认的22端口来通过ssh拉取代码,那么就需要做如下端口转发配置
# 注意配置此转发前,需要将对应NODE的本身ssh连接端口作一下修改,以防后面登陆不了该机器----node-4
2.1查看22端口
[root@node01 gitlab-runner2-ver130806-share]# kubectl get svc -n gitlab-ver130806
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
gitlab NodePort 10.68.156.103 <none> 80:31543/TCP,22:32693/TCP 3h17m
2.2更换原来的22端口---node04----17行
vim /etc/ssh/sshd_config
将port打开改成10022
2.3 systemctl restart sshd
2.4 ssh -p10022 172.24.75.140-------任意节点尝试连接
2.5查看已经生效
[root@node-1 gitlab-runner1-ver130806-docker]# iptables -t nat -nvL PREROUTING
Chain PREROUTING (policy ACCEPT 177 packets, 10440 bytes)
pkts bytes target prot opt in out source destination
123K 7328K cali-PREROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:6gwbT8clXdHdC1b1 */
124K 7355K KUBE-SERVICES all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
33349 1928K DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
0 0 DNAT tcp -- * * 0.0.0.0/0 10.0.1.204 tcp dpt:22 to:10.0.1.204:31755
1 40 DNAT tcp -- * * 0.0.0.0/0 172.24.75.117 tcp dpt:22 to:172.24.75.117:31755
----防火墙做node04
2.5 iptables -t nat -A PREROUTING -d 172.24.75.141 -p tcp --dport 22 -j DNAT --to-destination 172.24.75.141:30096 ###端口根据实际改 kubectl get svc -n gitlab-ver130806
2.6再次连接:ssh 172.24.75.141发现已经到gitlab连接----在node3或者其他节点
加到开启自启:
cat /etc/rc.local
#↑ 如果想删除上面创建的这一条规则,将-A换成-D即可
2.7iptables -t nat -nvL PREROUTING
s
2.8生成key
cd ~/.ssh/
cat id_rsa.pub
在gitlib搜索SSH keys将公钥配置到GitLab SSH key
CICD automation based on gitlab of the 15th k8s architect course
Original 2021-04-11 18:52 Boge loves operation and maintenance
Deploy dind (docker in docker)
Hello everyone, I am Bo Ge Ai Operation and Maintenance. We are now deploying the dind service in k8s to provide the entire CI (continuous integration) function.
Let's look at the results listed in the docker version. Docker adopts the C/S architecture. The Docker process does not listen to any ports by default. It will generate a socket (/var/run/docker.sock) file for local process communication. Docker C/S Using the Rest API as the communication protocol, we can let the Docker daemon process listen to a port, which provides the feasibility for us to use the docker client to call the docker daemon process remotely to execute image construction
docker in docker
# dind pip instll staus : kill -9 code 137(128+9) ,may be limits(cpu,memory) resources need change
# only have docker client ,use dind can be use normal
#dindSvc=$(kubectl -n kube-system get svc dind |awk 'NR==2{print $3}')
#export DOCKER_HOST="tcp://${dindSvc}:2375/"
#export DOCKER_DRIVER=overlay2
#export DOCKER_TLS_CERTDIR=""
# 先创建证书,哪个节点跑就哪个上面创建kubectl -n kube-system create secret generic harbor-ca --from-file=harbor-ca=/data/harbor/ssl/tls.cert
#下面记得改ip
vi dind.yaml
---
# SVC
kind: Service
apiVersion: v1
metadata:
name: dind
namespace: kube-system
spec:
selector:
app: dind
ports:
- name: tcp-port
port: 2375
protocol: TCP
targetPort: 2375
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: dind
namespace: kube-system
labels:
app: dind
spec:
replicas: 1
selector:
matchLabels:
app: dind
template:
metadata:
labels:
app: dind
spec:
hostNetwork: true
containers:
- name: dind
image: docker:19-dind
#image: harbor.boge.com/library/docker:19-dind
lifecycle:
postStart:
exec:
#私有仓库登陆
command: ["/bin/sh", "-c", "docker login harbor.boge.com -u 'admin' -p 'boge666'"]
# 3. when delete this pod , use this keep kube-proxy to flush role done
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
ports:
- containerPort: 2375
# resources:
# requests:
# cpu: 200m
# memory: 256Mi
# limits:
# cpu: 0.5
# memory: 1Gi
readinessProbe:
tcpSocket:
port: 2375
initialDelaySeconds: 10
periodSeconds: 30
livenessProbe:
tcpSocket:
port: 2375
initialDelaySeconds: 10
periodSeconds: 30
securityContext:
privileged: true
env:
- name: DOCKER_HOST
value: tcp://localhost:2375
- name: DOCKER_DRIVER
value: overlay2
- name: DOCKER_TLS_CERTDIR
value: ''
volumeMounts:
- name: docker-graph-storage
mountPath: /var/lib/docker
- name: tz-config
mountPath: /etc/localtime
#在node04生成secret
#kubectl -n kube-system create secret generic harbor-ca --from-file=harbor-ca=/data/harbor/ssl/tls.cert
- name: harbor-ca
mountPath: /etc/docker/certs.d/harbor.boge.com/ca.crt
subPath: harbor-ca
# kubectl create secret docker-registry boge-secret --docker-server=harbor.boge.com --docker-username=admin --docker-password=boge666 [email protected]
hostAliases:
- hostnames:
- harbor.boge.com
ip: 172.24.75.136
imagePullSecrets:
- name: bogeharbor
volumes:
# - emptyDir:
# medium: ""
# sizeLimit: 10Gi
- hostPath:
path: /var/lib/container/docker
name: docker-graph-storage
- hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
name: tz-config
- name: harbor-ca
secret:
secretName: harbor-ca
defaultMode: 0600
#
# kubectl taint node 10.0.1.201 Ingress=:NoExecute
# kubectl describe node 10.0.1.201 |grep -i taint
# kubectl taint node 10.0.1.201 Ingress:NoExecute-
nodeSelector:
kubernetes.io/hostname: "172.24.75.136"
tolerations:
- operator: Exists
The finale of CICD automation devops of the 15th k8s architect course
Original 2021-04-12 20:12 Boge loves operation and maintenance
CI/CD production actual combat project
Hello everyone, I am Bo Ge Ai Operation and Maintenance. Everyone must watch carefully, operate a lot, and have a thorough grasp of the entire process. Here I will use the flask module of python, which is a common programming language for enterprises, to implement the complete project automation process steps. Other languages can refer to this project to implement the automation process.
1.为了实现CD、、先把k8s的二进制命令行工具kubectl容器化备用
[root@node-1 ~]# mkdir kubectl-docker
[root@node-1 ~]# cd kubectl-docker/
cp /opt/kube/bin/kubectl .
vi Dockerfile
docker build -t 仓库地址/镜像名:版本号 .
docker build -t harbor.boge.com/library/kubectl:v1.19.9 .
docker push harbor.boge.com/library/kubectl:v1.19.9
##Dockerfile
FROM alpine:3.13
MAINTAINER boge
ENV TZ "Asia/Shanghai"
RUN sed -ri 's+dl-cdn.alpinelinux.org+mirrors.aliyun.com+g' /etc/apk/repositories \
&& apk add --no-cache curl tzdata ca-certificates \
&& cp -f /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
&& apk upgrade \
&& rm -rf /var/cache/apk/*
COPY kubectl /usr/local/bin/
RUN chmod +x /usr/local/bin/kubectl
ENTRYPOINT ["kubectl"]
CMD ["help"]
2. Python's flask module ----submit the following files to your own code warehouse
Prepare the flask-related code files and upload them to the gitlab code warehouse
app.py
from flask import Flask
app = Flask(__name__)
@app.route
def hello_world():
return 'Hello, boge! 21.04.11.01'
@app.route('/gg/<username>')
def hello(username):
return 'welcome' + ': ' + username + '!'
Dockerfile
FROM harbor.boge.com/library/python:3.5-slim-stretch
MAINTAINER boge
WORKDIR /kae/app
COPY requirements.txt .
RUN sed -i 's/deb.debian.org/ftp.cn.debian.org/g' /etc/apt/sources.list \
&& sed -i 's/security.debian.org/ftp.cn.debian.org/g' /etc/apt/sources.list \
&& apt-get update -y \
&& apt-get install -y wget gcc libsm6 libxext6 libglib2.0-0 libxrender1 make \
&& apt-get clean && apt-get autoremove -y && rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple -r requirements.txt \
&& rm requirements.txt
COPY . .
EXPOSE 5000
HEALTHCHECK CMD curl --fail http://localhost:5000 || exit 1
ENTRYPOINT ["gunicorn", "app:app", "-c", "gunicorn_config.py"]
gunicorn_config.py
bind = '0.0.0.0:5000'
graceful_timeout = 3600
timeout = 1200
max_requests = 1200
workers = 1
worker_class = 'gevent'
requirements.txt
flask
gevent
gunicorn
Configure the following variable values in the code warehouse variable configuration
2.2 在代码仓库变量配置里面配置如下变量值
Type Key Value State Masked
Variable DOCKER_USER admin 下面都关闭 下面都关闭
Variable DOCKER_PASS boge666
Variable REGISTRY_URL harbor.boge.com
Variable REGISTRY_NS product
File KUBE_CONFIG_TEST RBAC----k8s相关config配置文件内容
通过RBAC脚本创建sh 1.sh test https://172.24.75.136:6443
cat kube_config/test.kube.conf
复制到
3.准备准备项目自动化配置文件.gitlab-ci.yml
stages:
- build
- deploy
- rollback
# tag name need: 20.11.21.01
variables:
namecb: "flask-test"
svcport: "5000"
replicanum: "2"
ingress: "flask-test.boge.com"
certname: "mytls"
CanarylIngressNum: "20"
.deploy_k8s: &deploy_k8s |
if [ $CANARY_CB -eq 1 ];then cp -arf .project-name-canary.yaml ${namecb}-${CI_COMMIT_TAG}.yaml; sed -ri "s+CanarylIngressNum+${CanarylIngressNum}+g" ${namecb}-${CI_COMMIT_TAG}.yaml; sed -ri "s+NomalIngressNum+$(expr 100 - ${CanarylIngressNum})+g" ${namecb}-${CI_COMMIT_TAG}.yaml ;else cp -arf .project-name.yaml ${namecb}-${CI_COMMIT_TAG}.yaml;fi
sed -ri "s+projectnamecb.boge.com+${ingress}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
sed -ri "s+projectnamecb+${namecb}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
sed -ri "s+5000+${svcport}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
sed -ri "s+replicanum+${replicanum}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
sed -ri "s+mytls+${certname}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
sed -ri "s+mytagcb+${CI_COMMIT_TAG}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
sed -ri "s+harbor.boge.com/library+${IMG_URL}+g" ${namecb}-${CI_COMMIT_TAG}.yaml
cat ${namecb}-${CI_COMMIT_TAG}.yaml
[ -d ~/.kube ] || mkdir ~/.kube
echo "$KUBE_CONFIG" > ~/.kube/config
if [ $NORMAL_CB -eq 1 ];then if kubectl get deployments.|grep -w ${namecb}-canary &>/dev/null;then kubectl delete deployments.,svc ${namecb}-canary ;fi;fi
kubectl apply -f ${namecb}-${CI_COMMIT_TAG}.yaml --record
echo
echo
echo "============================================================="
echo " Rollback Indx List"
echo "============================================================="
kubectl rollout history deployment ${namecb}|tail -5|awk -F"[ =]+" '{print $1"\t"$5}'|sed '$d'|sed '$d'|sort -r|awk '{print $NF}'|awk '$0=""NR". "$0'
.rollback_k8s: &rollback_k8s |
[ -d ~/.kube ] || mkdir ~/.kube
echo "$KUBE_CONFIG" > ~/.kube/config
last_version_command=$( kubectl rollout history deployment ${namecb}|tail -5|awk -F"[ =]+" '{print $1"\t"$5}'|sed '$d'|sed '$d'|tail -${ROLL_NUM}|head -1 )
last_version_num=$( echo ${last_version_command}|awk '{print $1}' )
last_version_name=$( echo ${last_version_command}|awk '{print $2}' )
kubectl rollout undo deployment ${namecb} --to-revision=$last_version_num
echo $last_version_num
echo $last_version_name
kubectl rollout history deployment ${namecb}
build:
stage: build
retry: 2
variables:
# use dind.yaml to depoy dind'service on k8s
##dind svc地址 kubectl get svc -A|grep dind
DOCKER_HOST: tcp://10.68.248.116:2375/
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: ""
##services:
##- docker:dind
before_script:
- docker login ${REGISTRY_URL} -u "$DOCKER_USER" -p "$DOCKER_PASS"
script:
- docker pull ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:latest || true
- docker build --network host --cache-from ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:latest --tag ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:$CI_COMMIT_TAG --tag ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:latest .
- docker push ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:$CI_COMMIT_TAG
- docker push ${REGISTRY_URL}/${REGISTRY_NS}/${namecb}:latest
after_script:
- docker logout ${REGISTRY_URL}
tags:
- "docker"
only:
- tags
#--------------------------K8S DEPLOY--------------------------------------------------
BOGE-deploy:
stage: deploy
image: harbor.boge.com/library/kubectl:v1.19.9
variables:
KUBE_CONFIG: "$KUBE_CONFIG_TEST"
IMG_URL: "${REGISTRY_URL}/${REGISTRY_NS}"
NORMAL_CB: 1
script:
- *deploy_k8s
when: manual ##手动开关
only:
- tags
# canary start###金丝雀发布
BOGE-canary-deploy:
stage: deploy
image: harbor.boge.com/library/kubectl:v1.19.9
variables:
KUBE_CONFIG: "$KUBE_CONFIG_TEST"
IMG_URL: "${REGISTRY_URL}/${REGISTRY_NS}"
CANARY_CB: 1
script:
- *deploy_k8s
when: manual
only:
- tags
# canary end
BOGE-rollback-1:
stage: rollback
image: harbor.boge.com/library/kubectl:v1.19.9
variables:
KUBE_CONFIG: "$KUBE_CONFIG_TEST"
ROLL_NUM: 1
script:
- *rollback_k8s
when: manual
only:
- tags
BOGE-rollback-2:
stage: rollback
image: harbor.boge.com/library/kubectl:v1.19.9
variables:
KUBE_CONFIG: "$KUBE_CONFIG_TEST"
ROLL_NUM: 2
script:
- *rollback_k8s
when: manual
only:
- tags
BOGE-rollback-3:
stage: rollback
image: harbor.boge.com/library/kubectl:v1.19.9
variables:
KUBE_CONFIG: "$KUBE_CONFIG_TEST"
ROLL_NUM: 3
script:
- *rollback_k8s
when: manual
only:
- tags
4.准备k8s的deployment模板文件 .project-name.yaml
!!!!这里要注意提前在K8S把harbor拉取的凭证secret给创建好,命令如下:
# 命名空间 secret名字
kubectl -n test create secret docker-registry boge-secret --docker-server=harbor.boge.com --docker-username=admin --docker-password=boge666 --docker-[email protected]
---
# SVC
kind: Service
apiVersion: v1
metadata:
labels:
kae: "true"
kae-app-name: projectnamecb
kae-type: app
name: projectnamecb
spec:
selector:
kae: "true"
kae-app-name: projectnamecb
kae-type: app
ports:
- name: http-port
port: 80
protocol: TCP
targetPort: 5000
# nodePort: 12345
# type: NodePort
---
# Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
labels:
kae: "true"
kae-app-name: projectnamecb
kae-type: app
name: projectnamecb
spec:
tls:
- hosts:
- projectnamecb.boge.com
secretName: mytls
rules:
- host: projectnamecb.boge.com
http:
paths:
- path: /
backend:
serviceName: projectnamecb
servicePort: 80
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: projectnamecb
labels:
kae: "true"
kae-app-name: projectnamecb
kae-type: app
spec:
replicas: replicanum
selector:
matchLabels:
kae-app-name: projectnamecb
template:
metadata:
labels:
kae: "true"
kae-app-name: projectnamecb
kae-type: app
spec:
containers:
- name: projectnamecb
image: harbor.boge.com/library/projectnamecb:mytagcb
env:
- name: TZ
value: Asia/Shanghai
ports:
- containerPort: 5000
###健康监测可以去掉一下
###如如果没有200的返回的话
readinessProbe:
httpGet:
scheme: HTTP
path: /
port: 5000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
scheme: HTTP
path: /
port: 5000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
resources:
requests:
cpu: 0.3
memory: 0.5Gi
limits:
cpu: 0.3
memory: 0.5Gi
imagePullSecrets:
- name: boge-secret
5.准备准备好K8S上金丝雀部署的模板文件 .project-name-canary.yaml
---
# SVC
kind: Service
apiVersion: v1
metadata:
labels:
kae: "true"
kae-app-name: projectnamecb-canary
kae-type: app
name: projectnamecb-canary
spec:
selector:
kae: "true"
kae-app-name: projectnamecb-canary
kae-type: app
ports:
- name: http-port
port: 80
protocol: TCP
targetPort: 5000
# nodePort: 12345
# type: NodePort
---
# Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
labels:
kae: "true"
kae-app-name: projectnamecb-canary
kae-type: app
name: projectnamecb
annotations:
nginx.ingress.kubernetes.io/service-weight: |
###高阶参数、正式服务service:百分比,测试服务sevice:百分比
projectnamecb: NomalIngressNum, projectnamecb-canary: CanarylIngressNum
spec:
tls:
- hosts:
- projectnamecb.boge.com
secretName: mytls
rules:
- host: projectnamecb.boge.com
http:
paths:
- path: /
backend:
serviceName: projectnamecb
servicePort: 80
- path: /
backend:
serviceName: projectnamecb-canary
servicePort: 80
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: projectnamecb-canary
labels:
kae: "true"
kae-app-name: projectnamecb-canary
kae-type: app
spec:
replicas: replicanum
selector:
matchLabels:
kae-app-name: projectnamecb-canary
template:
metadata:
labels:
kae: "true"
kae-app-name: projectnamecb-canary
kae-type: app
spec:
containers:
- name: projectnamecb-canary
image: harbor.boge.com/library/projectnamecb:mytagcb
env:
- name: TZ
value: Asia/Shanghai
ports:
- containerPort: 5000
readinessProbe:
httpGet:
scheme: HTTP
path: /
port: 5000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
scheme: HTTP
path: /
port: 5000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
resources:
requests:
cpu: 0.3
memory: 0.5Gi
limits:
cpu: 0.3
memory: 0.5Gi
imagePullSecrets:
- name: boge-secret
Four methods to quickly generate k8s yaml configuration
1. Quickly generate a deployment and service yaml standard configuration through the kubectl command line
# 这条命令是不是很眼熟,对了,这就是博哥前面的课程里面创建deployment的命令,我们在后面加上`--dry-run -o yaml`,--dry-run代表这条命令不会实际在K8s执行,-o yaml是会将试运行结果以yaml的格式打印出来,这样我们就能轻松获得yaml配置了
# kubectl create deployment nginx --image=nginx --dry-run -o yaml
apiVersion: apps/v1 # <--- apiVersion 是当前配置格式的版本
kind: Deployment #<--- kind 是要创建的资源类型,这里是 Deployment
metadata: #<--- metadata 是该资源的元数据,name 是必需的元数据项
creationTimestamp: null
labels:
app: nginx
name: nginx
spec: #<--- spec 部分是该 Deployment 的规格说明
replicas: 1 #<--- replicas 指明副本数量,默认为 1
selector:
matchLabels:
app: nginx
strategy: {}
template: #<--- template 定义 Pod 的模板,这是配置文件的重要部分
metadata: #<--- metadata 定义 Pod 的元数据,至少要定义一个 label。label 的 key 和 value 可以任意指定
creationTimestamp: null
labels:
app: nginx
spec: #<--- spec 描述 Pod 的规格,此部分定义 Pod 中每一个容器的属性,name 和 image 是必需的
containers:
- image: nginx
name: nginx
resources: {}
status: {}
# 基于上面的deployment服务生成service的yaml配置
# kubectl expose deployment nginx --port=80 --target-port=80 --dry-run -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
app: nginx
name: nginx
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
status:
loadBalancer: {}
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
2. Use helm to view complex yaml configurations of various official standards for reference
# 以查看rabbitmq集群安装的配置举例
# 首先添加chart仓库
helm repo add aliyun-apphub https://apphub.aliyuncs.com
helm repo update
# 这里我们在后面加上 --dry-run --debug 就是模拟安装并且打印输出所有的yaml配置
helm install -n rq rabbitmq-ha aliyun-apphub/rabbitmq-ha --dry-run --debug
1234567
3. Convert docker-compose into k8s yaml format configuration
# 下载二进制包
# https://github.com/kubernetes/kompose/releases
# 开始转发yaml配置
./kompose-linux-amd64 -f docker-compose.yml convert
12345
4. Example of converting the docker command output into the corresponding yaml file
这里以 Prometheus Node Exporter 为例演示如何运行自己的 DaemonSet。
Prometheus 是流行的系统监控方案,Node Exporter 是 Prometheus 的 agent,以 Daemon 的形式运行在每个被监控节点上。
如果是直接在 Docker 中运行 Node Exporter 容器,命令为:
docker run -d \
-v "/proc:/host/proc" \
-v "/sys:/host/sys" \
-v "/:/rootfs" \
--net=host \
prom/node-exporter \
--path.procfs /host/proc \
--path.sysfs /host/sys \
--collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"
将其转换为 DaemonSet 的 YAML 配置文件 node_exporter.yml:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter-daemonset
spec:
template:
metadata:
labels:
app: prometheus
spec:
hostNetwork: true # <<<<<<<< 1 直接使用 Host 的网络
containers:
- name: node-exporter
image: prom/node-exporter
imagePullPolicy: IfNotPresent
command: # <<<<<<<< 2 设置容器启动命令
- /bin/node_exporter
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- ^/(sys|proc|dev|host|etc)($|/)
volumeMounts: # <<<<<<<< 3 通过Volume将Host路径/proc、/sys 和 / 映射到容器中
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: root
mountPath: /rootfs
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
Supplementary instructions on K8S service health detection methods
Original 2021-03-09 21:37 Boge loves operation and maintenance
Three Ways to Use Liveness and Readiness
readinessProbe: # 定义只有http检测容器6222端口请求返回是 200-400,则接收下面的Service web-svc 的请求
httpGet:
scheme: HTTP
path: /check
port: 6222
initialDelaySeconds: 10 # 容器启动 10 秒之后开始探测,注意观察下g1的启动成功时间
periodSeconds: 5 # 每隔 5 秒再探测一次
timeoutSeconds: 5 # http检测请求的超时时间
successThreshold: 1 # 检测到有1次成功则认为服务是`就绪`
failureThreshold: 3 # 检测到有3次失败则认为服务是`未就绪`
livenessProbe: # 定义只有http检测容器6222端口请求返回是 200-400,否则就重启pod
httpGet:
scheme: HTTP
path: /check
port: 6222
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
#-----------------------------------------
readinessProbe:
exec:
command:
- sh
- -c
- "redis-cli ping"
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
livenessProbe:
exec:
command:
- sh
- -c
- "redis-cli ping"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
#-----------------------------------------
readinessProbe:
tcpSocket:
port: 9092
initialDelaySeconds: 15
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 9092
initialDelaySeconds: 15
periodSeconds: 10