Ubuntu18.04 server deployment kubernetes detailed record (including graphics card configuration) | Ubuntu+Kubernetes

kubernetes platform | build from the operating system

1. Install and configure the operating system

1.1 Tool preparation

1.2 Installation system

Installing the operating system is simple. If you are installing the server version, you only need to change the source when installing: http://mirrors.aliyun.com/ubuntu. If it is a desktop version, you need to change the source after installation.

1.3 Change the source (if not changed before)

Change source

sudo vim /etc/apt/sources.list

Delete the original source and replace it with the following.

ubuntu 18.04(bionic) Alibaba Cloud Source

deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse

Update software source

sudo apt update
sudo apt upgrade

1.4 Configure openssh-server

Install openssh-server

sudo apt install openssh-server

Open the port and allow root login

sudo vim /etc/ssh/sshd_config

If you need to specify the port, you need to add the following:

Port <your-port>
PermitRootLogin yes

e.g.
Port 8848
PermitRootLogin yes

# 若指定端口 ssh登陆时也需要使用 -p 参数指定相应端口

If you use the default port (22), just add the following:

PermitRootLogin yes

2. Install docker-ce

Do not use apt to install docker directly , this article uses version docker18 (generally, the larger version is okay)

2.1 Uninstall the old version of docker

sudo apt-get remove docker docker-engine docker.io containerd runc

2.2 install docker

# 先安装安装docker需要的工具
sudo apt-get update

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
# 添加Docker的官方GPG密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
# 设置稳定的存储库
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
# 查询可选docker版本
apt-cache madison docker-ce

# 会出现类似以下内容
docker-ce | 5:18.09.9~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:18.09.8~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
 docker-ce | 5:18.09.7~3-0~ubuntu-bionic | https://download.docker.com/linux/ubuntu bionic/stable amd64 Packages
  ...
# 安装指定版本docker
sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io
# 本文使用的安装命令
sudo apt-get install docker-ce=5:19.03.14~3-0~ubuntu-bionic docker-ce-cli=5:19.03.14~3-0~ubuntu-bionic containerd.io

2.3 Verify that docker is installed successfully

sudo docker run hello-world

# 成功会输出带 Hello from Docker! 字样的信息

2.4 docker change source and configuration

Alibaba Cloud provides a free mirror accelerator address: https://cr.console.aliyun.com/cn-hangzhou/instances/mirrors

# 更换源
sudo tee /etc/docker/daemon.json <<-'EOF'
{
    
    
  "registry-mirrors": ["https://<上面地址直达获取>.mirror.aliyuncs.com"]
}
EOF

# 添加用户到docker组 否则每次使用docker命令都需要加sudo
sudo usermod -aG docker $USER

# 允许docker后台驻留
sudo systemctl enable docker

# 重启生效
sudo systemctl daemon-reload
sudo systemctl restart docker

2.5 uninstall docker

sudo apt-get purge docker-ce docker-ce-cli containerd.io
sudo rm -rf /var/lib/docker

3. Install nvidia-docker (if you need a graphics card)

3.1 Uninstall the old version of nvidia-docker

sudo apt-get purge -y nvidia-docker

3.2 Install nvidia-docker

Add to software warehouse

sudo vim /etc/hosts

# 添加以下内容 (解析nvidia.github.io)
185.199.108.153		nvidia.github.io				  
185.199.109.153		nvidia.github.io				  
185.199.110.153		nvidia.github.io				  
185.199.111.153		nvidia.github.io
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# apt 更新
sudo apt-get update

Check the compatible version of nvidia-docker and docker

apt-cache madison nvidia-docker2

# 部分输出信息
nvidia-docker2 | 2.0.3+docker18.09.7-3 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker18.09.6-3 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker18.09.5-3 | https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  Packages
...

View dependencies

apt-cache madison nvidia-container-runtime

# 部分输出信息
nvidia-container-runtime | 2.0.0+docker18.09.7-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.6-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-3 | https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  Packages
...

Install nvidia-docker and its dependencies

# 安装命令
sudo apt-get install -y nvidia-docker2=<VERSION_STRING> nvidia-container-runtime=<VERSION_STRING>
# 本文命令
sudo apt-get install -y nvidia-docker2=2.3.0-1 nvidia-container-runtime=3.3.0-1

View nvida-docker version

nvidia-docker version

3.3 Configure the graphics card

sudo vim /etc/docker/daemon.json

# 修改内容如下 "registry-mirrors"是阿里云容器镜像加速器的地址
{
    
    
  "registry-mirrors": ["https://<就是配置docker时的 不变>.mirror.aliyuncs.com"],
  "default-runtime": "nvidia",
  "runtimes": {
    
    
    "nvidia": {
    
    
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

# 生效
sudo pkill -SIGHUP dockerd

3.4 Install the graphics driver

the first method

View graphics card model

sudo apt install ubuntu-drivers-common
ubuntu-drivers devices

Automatically install graphics card driver

sudo ubuntu-drivers autoinstall

Install the specified version of the graphics card driver

sudo apt install <Version>

e.g.
sudo apt install nvidia-340

View graphics card

nvidia-smi

The second method

Go to the nvidia official website to download the corresponding driver for the graphics card [ official website address ]

Installation dependencies

sudo dpkg --add-architecture i386
sudo apt update
sudo apt install build-essential libc6:i386

Disable Ubuntu default driver

sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"

# 验证操作是否成功
cat /etc/modprobe.d/blacklist-nvidia-nouveau.conf
# 得到以下输出
blacklist nouveau
options nouveau modeset=0

Restart the operating system

sudo reboot

解决 WARNING: Unable to find suitable destination to install 32-bit compatibility libraries

sudo dpkg --add-architecture i386
sudo apt update
sudo apt install libc6:i386

View graphics card

nvidia-smi

4. Install cuda and cudnn (also only installed with graphics card)

4.1 CUDA

The following process needs to pay attention to the version number

Download the corresponding cuda (runfile) according to the operating system: official website address

# 安装cuda
sudo chmod 777 cuda_10.2.2_linux.run
sudo ./cuda_10.2.2_linux.run

# 配置环境变量
sudo vim ~/.bashrc
# 添加以下内容
export PATH=/usr/local/cuda-10.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH

# 使配置生效
source ~/.bashrc

# 查看cuda版本
cat /usr/local/cuda/version.txt
# 或
nvcc -V

4.2 install cudnn

The following process needs to pay attention to the version number

Go to the official website to download cudnn (registration required)

tar -zxvf cudnn-*    # (仅)此处*号用tab补全即可 就是下载的包的名称

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

# 查看cudnn版本
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

5. Install and deploy kubernetes (K8S)

5.1 安装 bebeadm kubelet kubectl

sudo apt-get update && sudo apt-get install -y ca-certificates curl software-properties-common apt-transport-https curl

curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -

# tee: 读取标准输入的数据 并将其内容输出成文件; cat也可
sudo tee /etc/apt/sources.list.d/kubernetes.list <<EOF 
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF

sudo apt-get update

# 查看可安装版本 查看kubectl的即可 三个工具版本需一致 (同查看nvidia-docker命令相同)
apt-cache madison kubectl

# 安装指定版本的 kubeadm kubelet kubectl (安装最新版不加版本号即可 不推荐)
sudo apt install kubelet=1.15.12-00 kubeadm=1.15.12-00 kubectl=1.15.12-00
sudo apt-mark hold kubelet kubeadm kubectl

# 查看是否安装成功
kubectl version

5.2 Disable swap partition

This step must be done, otherwise the master node cannot be initialized

# 临时关闭 重启后就无效了
sudo swapoff -a

# 持续化禁用
sudo vim /etc/fstab
# 注释掉 /swapfile 字样所在行

5.3 Initialize the master node

# 注: 
# 这里使用了非默认的CIDR(172.16.0.0/16)防止与局域网网段冲突 用10.244.0.0/16之类也可
# --image-repository: 国内用阿里云流畅些 可不用
sudo kubeadm init --pod-network-cidr 172.16.0.0/16 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers

# 完成之后执行以下命令 (系统也有提示)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 记录好你自己的加入工作节点的命令和token, 我的输出结果如下:
# 记录你自己电脑的输出!!! 这只是例子(下同)
kubeadm join 192.168.1.101:6443 --token 8evfdy.hl9yvreluqrle6vr \
    --discovery-token-ca-cert-hash sha256:7f619150f8b5c7c97a56f8b48f6b1344d16a2247fe57d02c74eb6583c1e11908

5.4 Install the network plug-in

This article chooses to install calico, you can search for "Kubernetes network plug-in comparative analysis" to select the appropriate network plug-in.

# 如果下载文件失败可以访问网址获取内容
# ubuntu-server没有图形界面 可使用scp命令  或mobaXterm(很方便)
# 同样可以访问官网查询版本
wget https://docs.projectcalico.org/v3.13/manifests/calico.yaml

# 修改 CALICO_IPV4POOL_CID 与上文(172.16.0.0/16)对应
# vim命令模式下搜索方法: 输入/CALICO_IPV4POOL_CID回车 按n跳转至下一个结果 N反之
# 防止与主机所在的局域网网段冲突

# 安装calico
kubectl apply -f calico.yaml

# 查看pods
kubectl get pods -A

5.5 Join a worker node

Join your other host as a worker node to the k8s cluster (the command in section 5.3):

kubeadm join 192.168.1.101:6443 --token 8zpuf4.3m2gm7o8ahlqf58u \
    --discovery-token-ca-cert-hash sha256:673a4120d7ddfa7c4e5d04f90f7c128629c20ffb4c0024160f2420f4eecb

If the token expires, execute the following command on the master node to obtain a new token

# 若不输入<token> 系统会给一个随机产生的值
kubeadm token create <token>

5.6 Allow master deployment (taint)

**If you only have one host, you can only deploy it on the master. **Why does k8s prohibit master deployment by default? [ Official document ]

# 允许master节点部署 一般只用这个就OK
kubectl taint nodes --all node-role.kubernetes.io/master-

# 若不允许调度 第一个master是你的主机名
kubectl taint nodes master node-role.kubernetes.io/master=:NoSchedule

6. Anthropomorphic Kuboard

Why not use kubernetes dashboard? You can search for "kuboard and dashboard"

If there is only one node, master deployment must be allowed (Section 5.6)

kubectl taint nodes --all node-role.kubernetes.io/master-

A dressed kuboard

kubectl apply -f https://kuboard.cn/install-script/kuboard.yaml

Uninstall kuboard (this is uninstall!!!)

kubectl delete -f https://kuboard.cn/install-script/kuboard.yaml

Get token

# 获取 user token
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep kuboard-user | awk '{print $1}')

# 例 我的token
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJvYXJkLXVzZXItdG9rZW4tZjluNTQiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoia3Vib2FyZC11c2VyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMmIzYWNhMTItNDU4Yy00MGMwLWEwMTYtZGFlMDU3MWUyNDNhIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmt1Ym9hcmQtdXNlciJ9.ae6XJDXT7S-xSF6l6yca9OE3Bue9wP4eEuBTteHkI-sSOIxtI1KRyl_eQZH7Y-zHO0wtSDkgCNQhCJntJe0ws6P6lgkWvtmEHSehnVlIGM0t3aOaKLnCfenkqG6X-slGEWwRlv091-UiJs9LC_UqA_Vp1B2KiriwY0oj7DuoKGj8fHxMzQFvTOzTsZqiw9pQtrMiMP3apBBTHkq60FmZ1JnUiMBozof4uTxiafCJJ3q8v78RW2EBDshVI8Ptb9GtVENjlhcLKqWZDINjOz0bnhStQyUG0_DgCziSXcRzbilqtTnZcZS11PsSan7bQZMF3M2w5tRg5ZBINN8D6AItBQ

# 获取 viewer token
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep kuboard-viewer | awk '{print $1}')

Get the port number exposed by kuboard (default 32567)

kubectl get svc -n kube-system

Visit kuboard

http://<IP>:<kuboardPort>
# 我的
http://192.168.1.101:32567

At this point, the deployment of the k8s platform is completed, and the application can be deployed. You can also install Helm, NFS service, kubens, etc. to facilitate the deployment of applications and management platforms.

Guess you like

Origin blog.csdn.net/qq_40759015/article/details/113591950