Getting Started with ClearML: Simplifying the Development and Management of Machine Learning Solutions

0. Introduction to ClearML

ClearML is an open source platform (formerly TRAINS) that automates and simplifies the development and management of machine learning solutions for thousands of data science teams around the world. It is designed as an end-to-end MLOps suite, allowing you to focus on developing ML code and automation, while ClearML ensures your work is repeatable and scalable.

  • Track and upload metrics and models with only 2 lines of code
  • Create a bot that sends you a Slack message every time the accuracy of your model improves
  • Reproduce experiments with 3 mouse clicks

To put it simply, for example, if you conduct deep learning training, it will involve input hyperparameter management, console output saving for reproduction, model file archiving, environmental information, etc. For example, sometimes a well-trained project is lost before it has time to record and analyze software or system problems. If you want to save it, you have to save all the above information as a local file, and you have to carefully deal with the results of many experiments during analysis, which is maddening.

ClearML is a management software. After recording the experiment, you can log in to your account on the web page to view it. Next, we will directly show the effect of common functions:
insert image description here

Experiment management is divided into levels, as shown in the figure below, the first is the project, and there are multiple experiments under the project (the project can also be a project), which are all customized. For example, if you do this experiment during this period of time, you can create a new project, and all the different experiments that need to be changed are recorded under this project.

As shown in the figure below, this project (Figure 2) has two experiments, which are two experiments with different learning rates.
insert image description here
insert image description here
Let’s check the details of the "lr adjust 0.1" experiment. Each experiment records information in detail. There are 8 page cards. We show the effects of several page cards. (For more details, see the document) EXECUTION: This page card records code (only running code, other related codes can record I don’t know yet), installed related libraries, Docker images, project output targets, and logging insert image description here
levels

CONFIGURATION: hyperparameters, user properties, and configuration objects. Each experiment has hyperparameters, such as specifying epoch, lr, etc. on the command line, will be recorded on this page
insert image description here

ARTIFACTS: input model, output model, model snapshot location, other artifacts. For example, if you want to save a snapshot of the model weight, it is also possible. After saving, a link will be provided here

INFO: Information about the experiment, such as: experiment start, environment information, creation and last update time and date, user who created the experiment and its description. Some screenshots are as follows:
insert image description here

CONSOLE: stdout, stderr, output from library to console and ClearML explicit reporting. For example, the output of the console or the display after running software such as pycharm will be recorded. As shown in the figure below, this avoids the problem of loss of output information after closing the console or software, which is convenient for resetting at any time.

SCALARS: The effect of this page card is as shown in the figure, that is, we can record values ​​through the ClearML API during training, such as acc, loss and other information, and then the card will be displayed visually on this page.
insert image description here
PLOTS: Other Plots and data, eg Matplotlib, Plotly and ClearML explicit reporting.
DEBUG SAMPLES: IMAGES, AUDIO, VIDEO, AND HTML

Of course, this is just an introduction to some common functions that ClearML is useful to me. In fact, ClearML, as a simplified machine learning solution development and management, is not just a "record". For more functions, please refer to the official website document CLEAR | ML

1. One of the simplest usage examples

Add code in your code:

from clearml import Task
task = Task.init(project_name="", task_name="")

A project can have many tasks, and tasks with the same project will be classified under the same project. If the project_name does not exist, create a new project directly, and each experiment is a task.

Two lines of code can record the command line input and output of your program. Figure 1 below shows the code and console information, and Figure 2 shows the running of the program recorded by the clearml platform (of course, environmental information, parameters, etc. are all recorded)
insert image description here
insert image description here

Then the problem comes, I insert two lines of code locally, and the experiment information can be queried on the clearml web page, indicating that the local code uploaded the experiment details, so it is necessary to register a clearml account and configure the steps of the local account

2. Register an account, pip install and configure a local account

2.1 Registration

Registering an account Needless to say, go to the official website to register.

Then after entering the personal homepage, click the upper right corner insert image description hereand then click "Setting", then click the following picture to create a license certificate (for local account configuration), and then a string of APIs in JSON format will be given, ( 复制到本地because it will only appear once).
insert image description here

2.2 pip installation and configuration of local accounts

Official website for installation tutorial reference

Installation: pip install clearml
Configuration (continue to type after installation): clearml-init
At this time, a prompt to paste the api will appear, and you can process it as shown in the figure (we use the official one, so just press Enter for the next three), if successfully appears at the end, it means the configuration is successfulinsert image description here

So far we have completed the local pip installation and account configuration, and then try the simplest usage example in Section 1 again. If it works normally, you can see the project
and task you set on the official website platform

3. Record hyperparameters

3.1 Record hyperparameters

Although Section 1 can record the parameters passed in from the command line, for example, if we use argparse to set many parameters but contain parameters with default values, they do not need to be passed in from the command line, so they will not be recorded, so we can use Task.connect() to directly record hyperparameters manually.

The Task.connect method directly connects Python objects such as dictionaries and custom classes to tasks. Once an object is connected to a task, ClearML automatically records all object elements (eg .class members, dictionary key-value pairs). Additionally, ClearML keeps track of changes to these values ​​in the code.

As shown in the figure below, argparse parses the key-value pair NameSpace, and you can directly record all hyperparameters through connect (the name is specified through the command line, and the default value used by epoche). You can see that the hyperparameters are indeed
recorded.
insert image description here

3.2 Add hyperparameters

For example, in Section 3.1, after recording args, I want to continue recording the value of lr, then:

task.connect(args)

hp={
    
    "lr":0.01}	# 形成字典
task.connect(hp)	

3.3 Hyperparameter optimization module

Official Documentation: Hyperparameter Optimization Module

I haven't used this function yet, but it is definitely very useful for tuning parameters. If you are interested, you can learn it by yourself first.

4. Record SCALARS

Official Documentation: SCALARS

We want to record the acc and loss values ​​​​in the training, and form a line chart similar to TensorBoard, we need to use SCALARS

The Logger class of clearml is used for explicit recording (similar to manual recording, such as hyperparameters can be manually connected to record, which is more flexible), so SCALARS also uses Logger to record

The core code is actually only one sentence:

from cearml import Logger	# 导入
logger = Logger.current_logger()	# 实例化
logger.report_scalar( title='', series=', value, iteration )

For example, the Loss records of Train and Valid are recorded, and the value is a scalar (not a type such as tensor, which needs to be converted into a python floating point number, etc.)

logger.report_scalar(title='Loss', series='Train', value=train_loss, iteration=epoch)
logger.report_scalar(title='Loss', series='Valid',  value=val_loss,  iteration=epoch)

You will probably get a picture like this
insert image description here

5. Record file

Artifacts Reporting | ClearML

Artifacts can easily record the output of experiments, including model snapshot weight files, etc. Artifacts are essentially files, which are automatically uploaded through clearml in the code.

For example, during training, you need to import different model definitions from different py scripts, and then start training. In order to facilitate subsequent analysis, in addition to recording the experimental process, if you want to record the code of the model definition, you can upload and save the .py file that defines the model through Artifacts:

net_path="utils/models/cbam.py"
task.upload_artifact(name="net path",artifact_object=net_path)

insert image description here
Click the download arrow to open a new webpage to see the content of the file, which is convenient for review.

Artifacts can record (upload) include but not limited to: Pandas DataFrames, Numpy objects, dictionaries, local files, folders, models/weights, etc.

# 新建Pandas DataFrames并上传
# 唯一支持动态跟踪的类型,详见官方文档
df = pd.DataFrame(
        {
    
    
            'num_legs': [2, 4, 8, 0],
            'num_wings': [2, 0, 0, 0],
            'num_specimen_seen': [10, 2, 1, 8]
        },
        index=['falcon', 'dog', 'spider', 'fish']
    )
task.upload_artifact(name='pd DataFrame', artifact_object=df)	

# 新建Numpy object数据并上传
task.upload_artifact(name='Numpy Eye', artifact_object=np.eye(100, 100))

# 新建dict数据并上传
task.upload_artifact(name='dict',artifact_object={
    
    "1":20,"2":55})

# 上传本地文件
task.upload_artifact('/path/to/preprocess_data.csv', name='data')

# 上传文件夹,会被自动打包以压缩包被记录
task.upload_artifact(name='folder', artifact_object=r"utils\models\hub")

# 通配符、pillow_image等......

Saving model snapshots/weights: PyTorch Model Update|ClearML

Guess you like

Origin blog.csdn.net/qq_40243750/article/details/126445671