Training strategy for very large images: Patch Gradient Descent

Preface This paper aims to address the problem of training existing CNN architectures on large-scale images while being computationally and memory constrained. Propose PatchGD, which is based on the assumption that instead of performing gradient-based updates on the entire image at once, it is better to perform model updates on only a small part of the image at a time, ensuring that most of it is covered during iterations.
PatchGD widely enjoys better memory and computational efficiency when training models on large-scale images. Especially in the case of limited computational memory, this method is more stable and efficient than standard gradient descent when processing large images.

Transformer, target detection, semantic segmentation exchange group

Welcome to pay attention to the public account CV technical guide , focusing on computer vision technical summary, latest technology tracking, interpretation of classic papers, CV recruitment information.

Computer Vision Introduction 1v3 Tutorial Class

Paper: https://arxiv.org/pdf/2301.13817.pdf

Thesis starting point

Existing deep learning models using CNNs are mainly trained and tested on relatively low resolution ranges (less than 300 × 300 pixels). This is partly because of widely used image benchmark datasets. Using these models on high-resolution images leads to a quadratic increase in the size of the associated activations, which in turn leads to a large increase in training computation and memory footprint. Furthermore, CNNs cannot handle such large images when available GPU memory is limited.

There is very limited work addressing the problem of using CNNs for very large images. One of the most common methods is to reduce the resolution of the image through downscaling. However, this leads to a large loss of information related to small-scale features and can adversely affect the semantic context associated with the image. Another strategy is to divide the image into overlapping or non-overlapping tiles, and then process these tiles sequentially. However, this approach does not guarantee that the semantic links between blocks will be preserved, and it hinders the learning process. Several similar strategies exist to attempt to learn the information contained in large images, however, their inability to capture global context limits their use.

This paper proposes a scalable training strategy aimed at building neural networks with very large images, very low memory computation, or a combination of both.

innovative ideas

This paper argues that "large images" should not be interpreted simply in terms of the number of pixels they contain, but that images should be considered too large to train using CNNs if the corresponding computational memory budget is small.

Hence PatchGD, which performs model updates using only part of the image at a time, while also ensuring that it sees almost the full context over the course of multiple steps.

method

General description

At its core, PatchGD is building or populating Z-blocks. Regardless of which parts of the input are used to perform the model update, Z builds an encoding of the full image from information obtained for different parts of the image in the previous few update steps.

The use of Z blocks is shown in Figure a. The input image is first divided into m×n blocks, and each block is processed using θ1 as an independent image. The output of the model is combined with the corresponding positions of the patches, and they are passed to the model as batches for processing, which are used to fill the various parts of Z.

To build an end-to-end CNN model, a small sub-network consisting of convolutional and fully-connected layers is added, which processes the information contained in Z and converts it into a probability vector required for classification tasks. The pipeline of model training and reasoning is shown in Figure b below. During training, the model components θ1 and θ2 are updated. Based on a fraction of patches sampled from the input image, the corresponding encoding is computed using the latest state of θ1, and the output is used to update the corresponding entry in the populated Z. The partially updated Z is then used to further calculate the loss function value and update the model parameters through backpropagation.

Mathematical formulation

PatchGD avoids model updates for the entire image sample at once, but uses only part of the images to compute gradients and update model parameters. Therefore, its model update step can be expressed as:

Among them, i represents the index of the mini-batch iteration within an epoch, and j represents the inner iteration. In each inner iteration, k patches are sampled from the input image X and an update of the gradient is performed.

Algorithm 1 describes model training on a batch of B images. As the first step in the model training process, initialize Z for each input image:

Algorithm 2 describes the filling process of Z:

result

本文在UltraMNIST和PANDA前列腺癌分级评估两个数据集上进行实验验证。其中,UltraMNIST 是一个分类数据集,每个样本包含 3-5 个不同比例的 MNIST 数字,这些数字位于图像中的随机位置,数字之和介于 0-9 之间。PANDA 数据集包含高分辨率组织病理学图像。

使用 ResNet50 架构在 UltraMNIST 分类任务中,对于512 × 512图像,PatchGD 的性能优于 GD 以及 GD-extended 的大幅提升:

同理,使用MobileNetV2 架构的对比情况:

PANDA 数据集上使用 Resnet50的验证情况:

欢迎关注公众号CV技术指南,专注于计算机视觉的技术总结、最新技术跟踪、经典论文解读、CV招聘信息。

【技术文档】《从零搭建pytorch模型教程》122页PDF下载

QQ交流群:444129970。群内有大佬负责解答大家的日常学习、科研、代码问题。

模型部署交流群:732145323。用于计算机视觉方面的模型部署、高性能计算、优化加速、技术学习等方面的交流。

新方案:从错误中学习,点云分割中的自我规范化层次语义表示

ECCV2022 | 重新思考单阶段3D目标检测中的IoU优化

目标检测模型的评价标准总结

Vision Transformer和MLP-Mixer联系和对比

Visual Attention Network

TensorRT教程(一)初次介绍TensorRT

TensorRT教程(二)TensorRT进阶介绍

从零搭建Pytorch模型教程(一)数据读取

从零搭建Pytorch模型教程(二)搭建网络

从零搭建Pytorch模型教程(三)搭建Transformer网络

从零搭建Pytorch模型教程(四)编写训练过程--参数解析

从零搭建Pytorch模型教程(五)编写训练过程--一些基本的配置

从零搭建Pytorch模型教程(六)编写训练过程和推理过程

从零搭建Pytorch模型教程(七)单机多卡和多机多卡训练

从零搭建pytorch模型教程(八)实践部分(一)微调、冻结网络

从零搭建pytorch模型教程(八)实践部分(二)目标检测数据集格式转换

【技术文档】《从零搭建pytorch模型教程》122页PDF下载

入门必读系列(十五)神经网络不work的原因总结

入门必读系列(十四)CV论文常见英语单词总结

入门必读系列(十三)高效阅读论文的方法

入门必读系列(十二)池化各要点与各方法总结

入门必读系列(十一)Dropout原理解析

入门必读系列(十)warmup及各主流框架实现差异

入门必读系列(九)彻底理解神经网络

入门必读系列(八)优化器的选择

入门必读系列(七)BatchSize对神经网络训练的影响

入门必读系列(六)神经网络中的归一化方法总结

入门必读系列(五)如何选择合适的初始化方法

入门必读系列(四)Transformer模型

入门必读系列(三)轻量化模型

入门必读系列(二)CNN经典模型

入门必读系列(一)欠拟合与过拟合总结

Guess you like

Origin blog.csdn.net/KANG157/article/details/128984438