[OpenMMLab AI Combat Camp Phase II] Deep Learning Pre-training and MMPretrain

Deep learning pre-training and MMPretrain

MMPreTrain algorithm library introduction

MMPretrain is a newly upgraded pre-training open source algorithm framework, which aims to provide a variety of powerful pre-training backbone networks and supports different pre-training strategies. MMPretrain is derived from the well-known open source projects MMClassification and MMSelfSup and has developed many exciting new features. Currently, the pre-training stage is crucial for visual recognition, and with rich and powerful pre-trained models, we are able to improve various downstream vision tasks.

Our codebase is designed to be an easy-to-use and user-friendly repository of codebases, and to simplify academic research activities and engineering tasks. We detail the features and design of MMPretrain in the different sections below.

Code warehouse: https://github.com/open-mmlab/mmpretrain
documentation tutorial: https://mmpretrain.readthedocs.io/en/latest/

Supports out-of-the-box inference APIs and models, including rich related tasks

  • image classification
  • image description
  • visual quiz
  • visual orientation
  • retrieve

Install

pip install openmim
git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
pip install -U openmim && mim install -e .

Multimodal installation

# 从源码安装
mim install -e ".[multimodal]"

# 作为 Python 包安装
mim install "mmpretrain[multimodal]>=1.0.0rc8"

verify installation

from mmpretrain import get_model, inference_model

model = get_model('resnet18_8xb32_in1k', device='cpu')  # 或者 device='cuda:0'
inference_model(model, 'demo/demo.JPEG')

code frame

insert image description here

classic backbone network

Basic idea of ​​residual learning

insert image description here

Two Residual Networks

insert image description here

ResNet(2015)

Based on VGG

Maintain multi-level organization and increase the number of layers

Increase cross-layer connection

ResNet-34 34-layer ImageNet Top-5 accuracy rate: 94.4

5 levels, each level contains several residual modules, different residual modules have different ResNet structures

Each stage output resolution is halved, communication is doubled

Global average pooling compresses spatial dimensions

A single fully connected layer produces class probabilities

ResNet's achievements and influence

One of the most influential and widely used model institutions in the field of deep learning, won the CVPR 2016 Best Paper Award

The residual structure has also been widely used so far, regardless of the various visual TransFormer or convolutional neural networks such as ConvNeXt in computer vision, or the recent popular GPT and various large language models, there are residual networks.

Vision Transformer

Divide the image into several small blocks of 16*16, and arrange all the blocks into word vectors. After linear layer mapping, a [H, W, C] dimension image becomes [L, C], and then passes through a multi-layer Transformer The calculation of the Encoder generates the corresponding feature vector

Add additional tokens outside the block to query the features of other patches and give the final classification

The attention module is based on the receptive field of the whole play, and the complexity is the 4th power of the scale

Common Types of Self-Supervised Learning

  • Based on various agent tasks
  • contrast-based learning
  • mask-based learning

SimCLR

Basic assumption: If the model can extract the essence of the picture content well, then no matter what kind of data enhancement operation the picture undergoes, the extracted features are very similar

Masked Autencoders

Basic assumption: Only by understanding the content of the picture and mastering the context information of the picture can the model recover the randomly occluded content in the picture

Guess you like

Origin blog.csdn.net/yichao_ding/article/details/131050199