[AI framework basic technology] Building a deep learning model based on Python API

Text @ small P classmate who does not want to be named

0 Preface

Hello everyone, this sharing, as the basic series of AI framework, will tell you in detail how to quickly build a deep learning model based on the Python API.

With the development of AI frameworks, building deep learning models has become easier and easier. For example, in the past, when building a house, you needed to add brick by brick. Later, people can use ready-made modules such as prefabricated panels to build houses like building blocks. After that, you can directly use 3D printing technology to print a shaped house according to the drawings.

The same is true for building a deep learning model. According to the complexity of the building, it can be divided into three stages :

  1. In the first stage before the emergence of AI frameworks, we need to implement each operator of the model by ourselves.
  2. In the second stage, we can call the operator units provided by the framework, such as convolution, activation function, time series processing unit, etc. to build the model. At present, most mainstream frameworks are mainly in this stage .
  3. In the third stage, we further develop some general templates based on the operator units provided by the AI ​​framework, and it will be more convenient to build models based on these templates.

The OpenMMlab series of algorithm frameworks are typical representatives of the third stage . These model templates are generally decoupled, and modules can generally be disassembled and reorganized, which can be easily and flexibly applied to different tasks, as well as subsequent debugging and upgrades.

In addition to the above three stages, it is even possible to directly output a nice model using AutoML techniques including Network Structure Search (NAS) etc.

Regardless of the above stages, the constructed practical deep learning network will face the problem of long training time .

For example, the classic ResNet network takes more than a week to complete a full training on the ImageNet dataset on a single V100 GPU. Therefore, in the daily training process, the researchers gathered practice and experience, and accumulated a lot of practical skills, which can greatly shorten the model training time.

This sharing will focus on how to use the operator building blocks provided by the AI ​​framework to build a deep learning model and efficiently train the model.

The full text will be expanded from the following two parts, and the code in the text will take PyTorch as an example.

  1. Building deep learning models based on Python API
  2. Accelerated and advanced skills when building deep learning models

1 Building a deep learning model based on the Python API

In short, building a deep learning model that can start training at any time mainly includes four steps:

  1. Build the main body of the model
  2. Configure optimization strategies
  3. define data
  4. Define the training pipeline

The following four aspects will be expanded.

1.1 Build the main structure of the model

Taking building a model my_modelas an example , a single operator is like a building block, which can be simply stacked when building, as long as the two adjacent interfaces are compatible. In the following example, a convolutional layer is used layer1and a linear fully connected layer is used classifier. The sequence of connections between blocks forwardis defined in . The characteristics of each building block, that is, the parameters of the operator, can be initialized randomly. In complex tasks, pre-trained models that are prepared in advance can also be loaded.

import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self, inplanes, features, num_classes=1000):
        super(MyModel, self).__init__()
        self.layer1 = nn.Conv2d(inplanes, inplanes)
        self.classifier = nn.Sequential(
            nn.Linear(features, num_classes)
        )
        self._initialize_weights()

    def forward(self, x):
        x = self.layer1(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        pass

my_model = MyModel(...)

1.2 Configuration optimization strategy

For ordinary office workers, the work from Monday to Friday will continue to cycle, but probably most people yearn for a life that will have some new gains every day.

Then an ordinary data sample begins an endless cycle of input and processing. The ideal situation is that after each piece of data comes and goes, something can be left. As for what is left, it is the loss calculation rule defined by the loss function. The gradient of the model parameters obtained by backpropagation of the loss obtained by the loss function. How these gradients are internalized into the model is exactly what the optimizer does.

In general, in the example below, the loss function ( criterion) determines what criteria to use during training. The optimizer( optimizer) determines how the parameters of the model are updated.

optimizer = torch.optim.SGD(my_model.parameter(), ...)
criterion = nn.CrossEntropyLoss()

1.3 Defining data

The training of the model requires data as input, and DatasetPyTorch Dataloaderrepresents the data by and .

Take the following code as an example, a custom dataset usually needs to include the size of the dataset ( __len__) and the way to read the data ( __getitem__). On the premise that the scale of each round of input data is determined, the former determines the number of rounds required to traverse all the data, and the latter determines how to obtain the input in the process of reading data in each round.

my_dataloaderIt determines how the data is distributed and combined in each round of training. This general AI training framework can be fully automated.

from torch.utils.data import Dataset
from torch.utils.data import Dataloader
class MyDataset(Dataset):
 	def __init__(self, meta_file):
      	# 自定义处理元数据文件的函数 parse_meta
      	self.meta_list = parse_meta(meta_file)
        
    def __getitem__(self, index):
      	# 自定义从元数据信息读取实际数据的函数 get_data 
        return get_data(self.meta_list[index])
 
    def __len__(self):
        return len(self.meta_list)
    
my_dataset = MyDataset(...)
my_dataloader = Dataloader(my_dataset, batch_size=4)

1.4 Define the training pipeline

As shown in the figure below, in a training, the data flow enters the network in sequence, and after the forward output is obtained according to the network structure and loss function, the backpropagation is performed, the gradient value of the parameter is saved, and it is updated according to the strategy defined by the optimizer to in the network parameters. The updated parameters will start the day with the next round of data.

Briefly describe it with the Python API, as shown in the following code:

def train(my_model, my_dataloader, criterion, optimizer):
    for index, (input, target) in enumerate(my_dataloader):
        output = my_model(input)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

In this way, a complete deep learning model is built.

Just as people are the key factor in a home, data is also the key to the entire model . It determines what kind of model to build and what optimization strategy to use, which ultimately determines the performance of the model.

2 Acceleration and advanced skills when building deep learning models

The current main acceleration methods include:

  1. Distributed parallelism
  2. mixed precision
  3. Compile online

These methods are briefly shared below, and the more detailed parts will be introduced in detail in the subsequent sharing content.

2.1 Distributed parallelism

Distributed parallelism includes two ways: data parallelism, model parallelism, and the fusion of these two ways.

Ideally, the time required for multi-card operation should be inversely proportional to the number of cards, but in practice, due to factors such as communication, the desired effect is often not achieved.

2.1.1 Data Parallelism

In multi-card (multi-process) training, data will be distributed to different cards, which will perform forward calculation and backpropagation independently, and then synchronize and update after obtaining gradient information.

The corresponding code changes are as follows:

# dataloader 中的 sampler 将在后续的数据分发过程中决定每张卡上的数据排列
train_sampler = torch.utils.data.DistributedSampler(my_dataset, shuffle=True)
train_loader = DataLoader(my_dataset, sampler = train_sampler)
 
# 将模型变成一个分布式模型
my_model = torch.nn.parallel.DistributedDataParallel(my_model)

DistributedDataParallelMainly do two things:

  1. During initialization, the parameters on each card need to be consistent;
  2. After each round of iterations, the gradients of parameters on different cards will be synchronized through communication to ensure that the parameters of each card are still consistent.

This has two advantages, one is that it can fully and flexibly utilize computing resources, which greatly shortens the time required for model training, and the other is that it is equivalent to increasing batch_size, and the gradient descent direction will be more in line with the actual situation of the overall dataset.

2.1.2 Model Parallelism

When the number of parameters of the model is very large, even a 32GB graphics card cannot accommodate such a large model training task. We can distribute parameters to different cards. Model parallelism can be performed at any layer of the model. If data parallelism is used in the previous layer, it is decided whether to merge the data together according to the calculation rules of the current layer.

The distribution of parameters can be carried out in a layer that does not require parameter order, such as a linear fully connected layer, or in a multi-task model, different task execution code blocks can be distributed to different cards.

2.2 Mixed Precision

Mixed precision training mainly uses half-precision ( float16) for training, while maintaining the single-precision ( float32) results. In order to maintain the precision, some computing units will be mixed float32with the calculation of . It can reduce computing memory and speed up computing.

Due to the overflow problem in some operators, in the training process, these operators need to be converted to float32for calculation and then back float16.

From the perspective of the user's Python code, mixed precision needs to modify the following three aspects: the transformation of model parameters, the transformation of input, and the update of optimization strategies such as loss scaling in the parameter update process. For the specific principle, please refer to Automatic Mixed Precision . The following is just a schematic code:

my_model.half()
for index, (input, target) in enumerate(my_dataloader):
  	input = input.half()
    # forward
    loss = ...
    # backward,scale 为对 loss 进行的放大操作
    scale(loss).backward()
    # fused_sgd 为用户自定义或者框架提供的包含 loss scaling 和 master copy of weights 的优化器
    fused_sgd.step()

2.3 Online compilation

The combination of Python and dynamic graph allows users to flexibly define the model calculation process, but in some more complex operator calculations, the overlapping use of a large number of small ops will increase the scheduling overhead and make the model have a larger space for optimization. Through online compilation technology, we can achieve acceleration in scenarios where small operators are dense.

In model training, there are many more flexible applications. For example, we can specify that only some specific parameters of the model are trained in a specific round.

In addition, we can also customize modules and operators. As long as forward and backward are implemented, they can be used as building blocks in the model.

C++ code can be compiled into Python modules using mechanisms such as pybind, taking into account both speed and ease of use.

Epilogue

Thank you for reading, please leave a message and discuss about the Python API to build a deep learning model~

We look forward to the students with the same interests to join us to explore and solve the problems and challenges faced by the AI ​​framework!

If you have any technical content and difficulties that you are interested in, please feel free to point them out, and you can leave us a message in the comment area~


Follow the public account "SenseParrots" to get the latest industry trends and technical thoughts on artificial intelligence frameworks:

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5682856/blog/5510621