Machine learning platform cube studio deployment

(Author: Chen Yujue)
Project address:
https://github.com/tencentmusic/cube-studio
or view
https://gitee.com/data-infra/cube-studio/blob/master/install/README.md

I want to deploy a machine learning platform, but the platform still has certain requirements for the machine, so I bought a server from Tencent Cloud, and the configuration information is as follows to avoid deployment failure due to machine performance problems during the deployment process.
insert image description here
After the purchase is complete, log in to the server directly. It is Jiangzi who
insert image description here
opens the homepage of the machine learning platform. If the network is relatively good, go to https://github.com/tencentmusic/cube-studio. If the network is not so good, go to https://gitee.com /data-infra/cube-studio, slide to the position where the platform is deployed, and see how to deploy it directly! The deployment process and required environment are in install/readme.md.

基础环境依赖

docker >= 19.03
kubernetes = 1.18
kubectl >=1.18
cfs/ceph 挂载到每台机器的 /data/k8s/
单机 磁盘>=500G 单机磁盘容量要求不大,仅做镜像容器的的存储
控制端机器 cpu>=16 mem>=32G
任务端机器,根据需要自行配置

On a brand new server, we need to install docker and k8s first. Since rancher can manage k8s clusters, we install rancher directly.

1. Install docker

#设置docker存储库
sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
 
 #添加官方秘钥
 sudo mkdir -p /etc/apt/keyrings
 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

#稳定存储库
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

#安装docker
sudo apt-get update
#查看存储库中的可用版本,因为我们需要19.03以上的docker
apt-cache madison docker-ce

insert image description here
Choose to install the required version of docker

sudo apt-get install docker-ce=5:19.03.15~3-0~ubuntu-focal docker-ce-cli=5:19.03.15~3-0~ubuntu-focal containerd.io docker-compose-plugin

Sometimes this error occurs

 Err:5 https://download.docker.com/linux/ubuntu focal/stable amd64 docker-ce-cli amd64 5:19.03.15~3-0~ubuntu-focal
  Could not wait for server fd - select (11: Resource temporarily unavailable) [IP: 13.249.171.37 443]

It is a network problem, just run it again.
Then test it
insert image description here
to show that the installation was successful.

Two, install rancher

sudo docker run -d --privileged --restart=unless-stopped  -p 443:443   rancher/rancher:v2.5.2

After the installation is complete, use the public network ip+443 port to open the rancher page.
insert image description here
Set a password, remove allow collection of anonymous statistics, and check I agree to the Terms and Conditions for using Rancher.

3. Configure the k8s cluster

After entering rancher, add cluter, and then you can choose to switch the language to Chinese in the lower right corner.
insert image description here
Select custom
insert image description here
insert image description here
insert image description here
to modify the yaml file, replace the kube_api part with the following, and add the following kubelet part, pay attention to the alignment of the spaces

    kube_api:
      always_pull_images: false
      pod_security_policy: false
      service_node_port_range: 10-32767
      extra_args:     
        service-account-issuer: kubernetes.default.svc
        service-account-signing-key-file: /etc/kubernetes/ssl/kube-service-account-token-key.pem
    kubelet:
      extra_binds:
        - '/data:/data'

insert image description here
In the host option, check the above three roles, and copy the following commands, execute them on the server, and click Finish.
insert image description here
This means the cluster and machines are ready to go!

4. Install cube-studio
Next download cube-studio, I downloaded the master branch

git clone https://gitee.com/data-infra/cube-studio.git

Click to enter the cluster
insert image description here
to open the kubeconfig file
insert image description here
, click Copy to Clipboard at the bottom left, switch to the /cube-studio/install/kubernetes directory in the server, add a new config file, add the copied content to the config file, and execute the following Order:

sudo sh start.sh 172.16.0.13

Remember to change the ip here to your intranet ip. This ip must be set to the ip you see when executing ifconfig on the host! ! ! Otherwise there will be bugs! ! !

After running, open the external network ip, for example, mine is 159.75.206.154, open http://159.75.206.154, and you need to move the namespace, if you
insert image description here
can open http://external network ip, it is successful. In rancher, check which component failed to install. Sometimes it is because of network problems that the image pull fails. You can pull it again (it doesn’t work when github is running), or you need to pull it manually, or submit a bug to the open source project.

Interface after successful deployment:
insert image description here

Reference link:
https://docs.docker.com/engine/install/ubuntu/
https://gitee.com/data-infra/cube-studio/tree/master
http://docs.rancher.cn/docs/ rancher2.5/quick-start-guide/deployment/quickstart-manual-setup/_index

Guess you like

Origin blog.csdn.net/weixin_39750084/article/details/124986488