A brief introduction to the cloud native machine learning platform cube-studio open source project and code

1. cube-studio introduction

Introduction to cloud native machine learning platform cube-studio: https://juejin.cn/column/7084516480871563272

cube-studio is an open source cloud-native machine learning platform. It currently includes a feature platform that supports online/offline features; data source management that supports structural data and media annotation data management; online development, online vscode/jupyter code development; and online image debugging. , supports dockerfile-free, incremental construction; task flow orchestration, online drag and drop; open template framework, supports distributed training tasks such as tf/pytorch/spark/ray/horovod/kaldi; single-node debugging of tasks, distributed tasks Batch priority scheduling, aggregation logs; task running resource monitoring, alarming; scheduled scheduling, supporting supplementary recording, ignore, retry, dependency, concurrency limit, intelligent correction of scheduled task computing power; super parameter search of nni, katib, ray; Multi-cluster multi-resource group, computing power coordination, federated scheduling; inference service of tf/pytorch/onnx model, serverless traffic control, tensorrt gpu inference acceleration, hpa capability based on gpu utilization/qps and other indicators, virtualized gpu, virtual video memory and other service capabilities.

Currently cube-studio is open sourced by Tencent Music to github: https://github.com/tencentmusic/cube-studio

Open source trial environment: http://114.96.98.168:20080/frontend/Account admin Password admin

2. Open source code framework and structure

The platform control end is the fab-python framework, which is based on the Flask framework. You can first understand the Flask framework.

2.1 Database related code

2.1.1 Database structure

The database structure is stored in myapp/models, as shown in the figure
Insert image description here

2.1.2 Database initialization

Insert image description here

As shown in the figure, writing to the database is initialized in cli.py, and data operations are performed through db.session.

2.1.3 Database update iteration

The iterative version file is in myapp/migrations/versions
1. First use the myapp db upgrade command to update the database to the latest according to the interface in the versions directory.

2. Then myapp db migrate updates the latest interface in the code to the versions directory.

3. Then call the myapp db upgrade command to update to the latest version of the database.
Insert image description here

2.2 Introduction to backend interface

2.2.1 Introduction to back-end code

Insert image description here
Insert image description here

As shown in the figure, by defining a class, setting the route_base basic route, and then defining the specific api interface and restful method through @expose, and then adding the class through appbuilder.add_api, a backend interface can be defined.

	#所有api操作header
    headers = {
    
    
      'Content-Type': 'application/json',
      'Authorization': '$rtx|$token'
    }
 - api:(GET):http://x.x.x.x/$view/api/
	#获取当前view 增删改查搜索接口中需要发送的参数,以及参数的描述
 - api: (GET):http://xx.xx.xx.xx/$view/api/_info 
 #list接口,其中$value为json序列化后的字符串
 - api: (GET):http:/xxx/api/?form_data=$value
 
 - api:(POST):http://x.x.x.x/$view/api/

 - api:(PUT):http://x.x.x.x/$view/api/<id>

 - api:(GET):http://x.x.x.x/$view/api/<id>

 - api:(DELETE):http://x.x.x.x/$view/api/<id>

	#单数据操作
 - api:(GET):http://x.x.x.x/$view/api/action/$action_name/<id>
	#批数据操作
 - api:(POST):http://x.x.x.x/$view/api/multi_action/$action_name/
	json参数为{
    
    "ids":[xx,xx,xx]}

2.2.2 Interface filter function

过滤函数:
Starts with:sw
Not Starts with:nsw
Ends with:ew
Not Ends with:new
Contains:ct
Not Contains:nct
Equal to:eq
Not Equal to:neq
Greater than:gt
Smaller than:lt
Relation:rel_o_m
No Relation:nrel_o_m
elation as Many:rel_m_m
Filter view with a function:eqf
Filter view where field is in a list returned by a function:inf

#filter示例:
#查询name列包含aa的记录
"filters":[
	{
    
    
	  "col": "name",
	  "opr": "ct",   
	  "value": "aa"
	}
]
#filter示例:
#a表通过外键b_id绑定b表,查询指定b_id为1的所有a
"filters":[
    {
    
    
        "col": "b",
        "opr": "rel_o_m",
        "value": 1
    }
]

# 分页
"page":0,
"page_size":10,

# 排序
order_column: $column1
order_direction: 'desc'

2.2.3 Development of celery scheduled/asynchronous tasks

  • Scheduled task code development: myapp/tasks/schedules.py
    Insert image description here

    Usage scenarios: 1. Scheduled tasks, such as regularly deleting old workflows, tfjob, pytorchjob, test tasks, services, notebooks, etc. Submit the configuration of scheduled tasks regularly, monitor GPU resources, and allocate resources among multiple project groups

  • Asynchronous task code development: myapp/tasks/async_task.py

    Usage scenarios: 1. Asynchronous tasks, functions that take a long time to complete, such as grayscale upgrade services, building images, etc.

  • Task configuration: CeleryConfig in config.pyInsert image description here

2.2.4 Monitor crd changes

Insert image description here

Code development: myapp/tools/watch_xx.py
Insert image description here

Usage scenarios: monitor status changes of training and inference workflows, push messages, and record task queues through redis cache

2.2.5 Call k8s related components

Insert image description here

3. Project operation

3.1 Local operation

Running locally mainly requires two configuration files mysql-compose.yml and docker-compose.yml
Insert image description here
1. Start local docker. Windows can enable docker-desktop.

2 Run the mysql database

cd ./install/docker目录下执行 docker-compose -f .\mysql-compose.yml up

3.Build front-end and back-end images,

  • Front-end image dockerfile: install/docker/dockerFrontend/DockerfileInsert image description here

  • Backend image dockerfile: install/docker/Dockerfile.dashboardInsert image description here

4. Modify the front-end and back-end images in /install/docker/docker-compose.yml and run docker-compose. You can view the page http://localhost:8888/frontend in the browser.

docker-compose -f /install/docker/docker-compose.yml up

3.2 Running the container

3.2.1 infra namespace

Insert image description here
As shown in the figure above:
kubeflow-dashboard: back-end container
kubeflow-dashboard-frontend: front-end container
kubeflow-dashboard-schedule: asynchronous, scheduled and other task scheduling container
kubeflow-dashboard-worker: asynchronous, scheduled and other task work container
kubeflow-watch: crd and other monitoring containers
mysql: database
redis: cache database, record asynchronous, scheduled and other task queues

3.2.2 kubeflow namespace

Insert image description here

Guess you like

Origin blog.csdn.net/qq_45808700/article/details/135147375