(Author: Chen Yujue)
Project address:
https://github.com/tencentmusic/cube-studio
or view
https://gitee.com/data-infra/cube-studio/blob/master/install/README.md
I want to deploy a machine learning platform, but the platform still has certain requirements for the machine, so I bought a server from Tencent Cloud, and the configuration information is as follows to avoid deployment failure due to machine performance problems during the deployment process.
After the purchase is complete, log in to the server directly. It is Jiangzi who
opens the homepage of the machine learning platform. If the network is relatively good, go to https://github.com/tencentmusic/cube-studio. If the network is not so good, go to https://gitee.com /data-infra/cube-studio, slide to the position where the platform is deployed, and see how to deploy it directly! The deployment process and required environment are in install/readme.md.
基础环境依赖
docker >= 19.03
kubernetes = 1.18
kubectl >=1.18
cfs/ceph 挂载到每台机器的 /data/k8s/
单机 磁盘>=500G 单机磁盘容量要求不大,仅做镜像容器的的存储
控制端机器 cpu>=16 mem>=32G
任务端机器,根据需要自行配置
On a brand new server, we need to install docker and k8s first. Since rancher can manage k8s clusters, we install rancher directly.
1. Install docker
#设置docker存储库
sudo apt-get update
sudo apt-get install \
ca-certificates \
curl \
gnupg \
lsb-release
#添加官方秘钥
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
#稳定存储库
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
#安装docker
sudo apt-get update
#查看存储库中的可用版本,因为我们需要19.03以上的docker
apt-cache madison docker-ce
Choose to install the required version of docker
sudo apt-get install docker-ce=5:19.03.15~3-0~ubuntu-focal docker-ce-cli=5:19.03.15~3-0~ubuntu-focal containerd.io docker-compose-plugin
Sometimes this error occurs
Err:5 https://download.docker.com/linux/ubuntu focal/stable amd64 docker-ce-cli amd64 5:19.03.15~3-0~ubuntu-focal
Could not wait for server fd - select (11: Resource temporarily unavailable) [IP: 13.249.171.37 443]
It is a network problem, just run it again.
Then test it
to show that the installation was successful.
Two, install rancher
sudo docker run -d --privileged --restart=unless-stopped -p 443:443 rancher/rancher:v2.5.2
After the installation is complete, use the public network ip+443 port to open the rancher page.
Set a password, remove allow collection of anonymous statistics, and check I agree to the Terms and Conditions for using Rancher.
3. Configure the k8s cluster
After entering rancher, add cluter, and then you can choose to switch the language to Chinese in the lower right corner.
Select custom
to modify the yaml file, replace the kube_api part with the following, and add the following kubelet part, pay attention to the alignment of the spaces
kube_api:
always_pull_images: false
pod_security_policy: false
service_node_port_range: 10-32767
extra_args:
service-account-issuer: kubernetes.default.svc
service-account-signing-key-file: /etc/kubernetes/ssl/kube-service-account-token-key.pem
kubelet:
extra_binds:
- '/data:/data'
In the host option, check the above three roles, and copy the following commands, execute them on the server, and click Finish.
This means the cluster and machines are ready to go!
4. Install cube-studio
Next download cube-studio, I downloaded the master branch
git clone https://gitee.com/data-infra/cube-studio.git
Click to enter the cluster
to open the kubeconfig file
, click Copy to Clipboard at the bottom left, switch to the /cube-studio/install/kubernetes directory in the server, add a new config file, add the copied content to the config file, and execute the following Order:
sudo sh start.sh 172.16.0.13
Remember to change the ip here to your intranet ip. This ip must be set to the ip you see when executing ifconfig on the host! ! ! Otherwise there will be bugs! ! !
After running, open the external network ip, for example, mine is 159.75.206.154, open http://159.75.206.154, and you need to move the namespace, if you
can open http://external network ip, it is successful. In rancher, check which component failed to install. Sometimes it is because of network problems that the image pull fails. You can pull it again (it doesn’t work when github is running), or you need to pull it manually, or submit a bug to the open source project.
Interface after successful deployment:
Reference link:
https://docs.docker.com/engine/install/ubuntu/
https://gitee.com/data-infra/cube-studio/tree/master
http://docs.rancher.cn/docs/ rancher2.5/quick-start-guide/deployment/quickstart-manual-setup/_index