Classic neural network (7) DenseNet and its application on the Fashion-MNIST data set

Classic neural network (7) DenseNet and its application on the Fashion-MNIST data set

1 Brief description of DenseNet

  1. DenseNetNot through deeper or wider structures, but through feature reuse to improve the learning ability of the network.

  2. ResNetThe idea is: create a direct connection from "layers near the input" to "layers near the output". And DenseNetdo it more thoroughly: connect all the layers in a feed-forward form, so this kind of network is called DenseNet.

  3. DenseNetHas the following advantages:

    • Alleviate the problem of vanishing gradients. Because each layer can directly obtain the gradient from the loss function and obtain information from the original input, it is easy to train.
    • Dense connections also have a regularizing effect, alleviating overfitting for small training set tasks.
    • Encourage feature reuse. feature mapThe network combines what has been learned by different layers .
    • Significantly reduces the number of parameters. Because the convolution kernel size of each layer is relatively small, the number of output channels is small (determined by the growth rate).
  4. DenseNetHas fewer parameters than traditional convolutional networks because it does not require redundant relearning feature map.

    • Traditional feedforward neural networks can be viewed as 状态algorithms that pass between layers. Each layer receives the information from the previous layer 状态and then passes the new 状态information to the next layer.

      This changes 状态, but also conveys information that needs to be preserved.

    • ResNetThe information that needs to be retained is directly transferred through identity mapping, so only transfer between layers is required 状态的变化.

    • DenseNetAll layers of all layers will 状态be saved to 集体知识, while each layer will add a small number of layers feture mapto the network 集体知识中.

  5. DenseNetThe layers are very narrow (that is, feature mapthe number of channels is small), for example, the output of each layer has only 12 channels.

  6. In terms of cross-layer connections, unlike ResNet where input and output are added, dense connection networks (DenseNet) are in the channel dimension 连结输⼊与输出. The main building blocks of DenseNet are 稠密块和过渡层. When building DenseNet, we need to 添加过渡层来reduce the number of channels again by controlling the dimensionality of the network.

  7. Although DenseNethas high computational efficiency and relatively few parameters, it DenseNetis not memory-friendly. You can consider sharing memory to solve this problem.

  8. Paper download address: https://arxiv.org/pdf/1608.06993.pdf

1.1 Dense block (dense block)

insert image description here

insert image description here

The key difference between ResNet and DenseNet is that the output of DenseNet is a connection (represented by [ , ] in the figure below), rather than a simple addition like ResNet.

insert image description here

The name DenseNet comes from the "dense connections" between variables, with the last layer being closely connected to all previous layers.

insert image description here

注意: When feature mapthe size of is changed, splicing cannot be performed along the channel direction. At this time, the network is divided into multiple DenseNetblocks, the internal size of each block feature mapis the same, and the size between blocks feature mapis different.

1.1.1 Growth rate

  1. DenseNetIn the block, the number of channels output by each layer of H (that is, BN-ReLU-Conv) feature mapis the same, which is k. k is an important hyperparameter called the growth rate of the network.

    The number of channels of the input [feature map] of the l-th layer is: k 0 + k ( l − 1 ). where k 0 is the number of channels of the input layer. The number of channels of the input [feature map] of the l-th layer is: k_0 + k(l-1). where k_0 is the number of channels of the input layer.The number of channels of the input [feature map] of layer l is : k0+k(l1 ) . Among them k0is the number of channels in the input layer.

  2. DenseNetAn important difference from existing networks is that DenseNetthe network is very narrow, that is, the number of output feature mapchannels is small, such as: k = 12.

    • A small growth rate can achieve good results. One explanation is that DenseNeteach layer of a block has access to the outputs of all previous layers within the block feature map, which feature mapcan be regarded as DenseNetthe global state of the block. The output of each layer feature mapwill be added to the global state of the block, which can be understood as the [collective knowledge] of the network block and is shared by all layers in the block. The growth rate determines the proportion of new features in the global state.

    • Therefore, feature mapthere is no need for layer-by-layer replication (because it is globally shared), which is also DenseNetdifferent from traditional network structures. This facilitates feature reuse across the network and produces more compact models.

1.1.2 Nonlinear transformation

  • H can be a composite function including Batch Normalization (BN), ReLU unit, pooling or convolution and other operations.

  • The structure in the paper is: first perform BN, then perform ReLU, and finally follow a 3 x 3 convolution, that is: BN-ReLU-Conv(3x3)

  • pytorch is implemented as follows

import torch.nn as nn
import torch


'''
DenseNet使⽤了ResNet改良版的“批量规范化、激活和卷积”架构

    卷积块:BN-ReLU-Conv
'''
def conv_block(input_channels, num_channels):

    return nn.Sequential(
                  nn.BatchNorm2d(input_channels),
                  nn.ReLU(),
                  nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1)
         )

1.1.3 bottleneck

insert image description here

1.1.4 pytorch implements dense blocks

import torch.nn as nn
import torch


'''
DenseNet使⽤了ResNet改良版的“批量规范化、激活和卷积”架构

    卷积块:BN-ReLU-Conv
'''
def conv_block(input_channels, num_channels):

    return nn.Sequential(
                  nn.BatchNorm2d(input_channels),
                  nn.ReLU(),
                  nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1)
         )



'''
⼀个稠密块由多个卷积块组成,每个卷积块使⽤相同数量的输出通道。

然⽽,在前向传播中,我们将每个卷积块的输⼊和输出在通道维上连结。
'''
class DenseBlock(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(DenseBlock, self).__init__()

        layer = []
        for i in range(num_convs):
            layer.append(
                conv_block(num_channels * i + input_channels, num_channels)     # 一个稠密块由多个卷积块组成
            )
        self.net = nn.Sequential(*layer)


    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            # 连接通道维度上每个块的输⼊和输出
            X = torch.cat((X, Y), dim=1)
        return X

if __name__ == '__main__':
    '''
    1、稠密块 dense block
    我们定义⼀个有2个输出通道数为10的DenseBlock。
    使⽤通道数为3的输⼊时,我们会得到通道数为3 + 2 × 10 = 23的输出。
    卷积块的通道数控制了输出通道数相对于输⼊通道数的增⻓,因此也被称为增⻓率(growth rate)。
    '''
    blk = DenseBlock(2, 3, 10)
    # X经过第一个卷积块后变为(4, 10, 8, 8),然后和原始X(4, 3, 8, 8)进行在维度1进行拼接,X变成(4, 13, 8, 8)
    # 然后输入到第二个卷积块,第二个卷积块将channels由(10+3)变为10,因此输出Y(4, 10, 8, 8)
    # 然后X和Y在维度1进行拼接,得到最终输出(4, 23, 8, 8)
    X = torch.randn(4, 3, 8, 8)
    Y = blk(X)
    print(Y.shape)  # (4, 23, 8, 8)

1.2 Transition layer

1.2.1 Introduction to transition layer

  • A DenseNetnetwork has multiple DenseNetblocks DenseNetconnected by transition layers. DenseNetThe layers between blocks are called transition layers, and their main role is to connect different DenseNetblocks.

  • Transition layers can contain convolution or pooling operations, thereby changing the size (including size, number of channels) of DenseNetthe output of the previous block .feature map

    • The transition layer in the paper consists of a BNlayer, a 1x1convolutional layer, and an 2x2average pooling layer. Among them 1x1, the convolutional layer is used to reduce DenseNetthe number of output channels of the block and improve the compactness of the model.
    • If DenseNetthe number of output channels of the block is not reduced, after blocks DenseNet, the number of channels of the network feature mapwill become very large (the number of channels is calculated by the formula shown in the figure below)

insert image description here

  • If the number of channels Denseoutput by the block is m, the number of channels feature mapoutput by the transition layer can be theta ✖ m, where 0< theta <=1 is the compression factor.feature map
    • When theta = 1, feature mapthe number of channels passing through the transition layer remains unchanged.
    • When theta < 1, feature mapthe number of channels passing through the transition layer decreases. At this time DenseNetit is called DenseNet-C.
    • The improved network that combines DenseNet-Cand is calledDenseNet-BDenseNet-BC

1.2.2 Implementation of transition layer

'''
由于每个稠密块都会带来通道数的增加,使⽤过多则会过于复杂化模型。
⽽过渡层可以⽤来控制模型复杂度。它通过1 × 1卷积层来减⼩通道数,并使⽤步幅为2的平均汇聚层减半⾼和宽,从⽽进⼀步降低模型复杂度。
'''
def transition_block(input_channels, num_channels):
    return nn.Sequential(
            nn.BatchNorm2d(input_channels),
            nn.ReLU(),
            nn.Conv2d(input_channels, num_channels, kernel_size=1), # 1×1卷积层来减⼩通道数
            nn.AvgPool2d(kernel_size=2, stride=2)                   # 步幅为2的平均汇聚层减半⾼和宽
    )




if __name__ == '__main__':
    '''
    1、稠密块 dense block
    我们定义⼀个有2个输出通道数为10的DenseBlock。
    使⽤通道数为3的输⼊时,我们会得到通道数为3 + 2 × 10 = 23的输出。
    卷积块的通道数控制了输出通道数相对于输⼊通道数的增⻓,因此也被称为增⻓率(growth rate)。
    '''
    blk = DenseBlock(2, 3, 10)
    # X经过第一个卷积块后变为(4, 10, 8, 8),然后和原始X(4, 3, 8, 8)进行在维度1进行拼接,X变成(4, 13, 8, 8)
    # 然后输入到第二个卷积块,第二个卷积块将channels由(10+3)变为10,因此输出Y(4, 10, 8, 8)
    # 然后X和Y在维度1进行拼接,得到最终输出(4, 23, 8, 8)
    X = torch.randn(4, 3, 8, 8)
    Y = blk(X)
    print(Y.shape)  # (4, 23, 8, 8)


    '''
    2、过渡层 transition layer
    '''
    blk = transition_block(23, 10)
    print(blk(Y).shape)  # torch.Size([4, 10, 4, 4])

1.3 DenseNet network performance

1.3.1 Network structure

Network structure: ImageNettrained DenseNetnetwork structure with growth rate k = 32.

  • in the table convrepresents BN-ReLU-Convthe combination of . For example 1x1 conv, it means: execute first BN, then execute ReLU, and finally execute 1x1the convolution of .
  • DenseNet-xxIndicates that DenseNetthe block has xxlayers. For example: DenseNet-169it means that DenseNetthe block has L=169 layers.
  • All DenseNetuse DenseNet-BCthe structure, the input image size is 224x224, the initial convolution size is 7x7, the output channel 2k, the step size is 2, and the compression factor theta=0.5.
  • At DenseNetthe end of all blocks, there is a global average pooling layer, and the result of this pooling layer is used as softmaxthe input of the output layer.

insert image description here

1.3.2 ImageNetError rate on validation set

The figure below is a comparison of the error rates of DenseNetand ResNeton ImageNetthe validation set ( single-crop). The picture on the left shows the number of parameters, and the picture on the right shows the amount of calculation.

insert image description here

It can be seen from the experiment that DenseNetthe number of parameters and the amount of calculation are relatively ResNetsignificantly reduced.

  • The validation error of with 20Mparameters is close to DenseNet-201that of with 40Mparameters .ResNet-101
  • The computational effort of is close to the ResNet-101verification error of is close to , almost half of.DenseNet-201ResNet-50ResNet-101

1.3.3 Implementation of a simple version of DenseNet

We implement a simple version of DenseNet, using DenseNet instead of DenseNet-BC, for application on the Fashion-MNIST dataset.

稠密块和过度层

import torch.nn as nn
import torch


'''
DenseNet使⽤了ResNet改良版的“批量规范化、激活和卷积”架构

    卷积块:BN-ReLU-Conv
'''
def conv_block(input_channels, num_channels):

    return nn.Sequential(
                  nn.BatchNorm2d(input_channels),
                  nn.ReLU(),
                  nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1)
         )



'''
⼀个稠密块由多个卷积块组成,每个卷积块使⽤相同数量的输出通道。

然⽽,在前向传播中,我们将每个卷积块的输⼊和输出在通道维上连结。
'''
class DenseBlock(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(DenseBlock, self).__init__()

        layer = []
        for i in range(num_convs):
            layer.append(
                conv_block(num_channels * i + input_channels, num_channels)     # 一个稠密块由多个卷积块组成
            )
        self.net = nn.Sequential(*layer)


    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            # 连接通道维度上每个块的输⼊和输出
            X = torch.cat((X, Y), dim=1)
        return X

'''
由于每个稠密块都会带来通道数的增加,使⽤过多则会过于复杂化模型。
⽽过渡层可以⽤来控制模型复杂度。它通过1 × 1卷积层来减⼩通道数,并使⽤步幅为2的平均汇聚层减半⾼和宽,从⽽进⼀步降低模型复杂度。
'''
def transition_block(input_channels, num_channels):
    return nn.Sequential(
            nn.BatchNorm2d(input_channels),
            nn.ReLU(),
            nn.Conv2d(input_channels, num_channels, kernel_size=1), # 1×1卷积层来减⼩通道数
            nn.AvgPool2d(kernel_size=2, stride=2)                   # 步幅为2的平均汇聚层减半⾼和宽
    )


if __name__ == '__main__':
    '''
    1、稠密块 dense block
    我们定义⼀个有2个输出通道数为10的DenseBlock。
    使⽤通道数为3的输⼊时,我们会得到通道数为3 + 2 × 10 = 23的输出。
    卷积块的通道数控制了输出通道数相对于输⼊通道数的增⻓,因此也被称为增⻓率(growth rate)。
    '''
    blk = DenseBlock(2, 3, 10)
    # X经过第一个卷积块后变为(4, 10, 8, 8),然后和原始X(4, 3, 8, 8)进行在维度1进行拼接,X变成(4, 13, 8, 8)
    # 然后输入到第二个卷积块,第二个卷积块将channels由(10+3)变为10,因此输出Y(4, 10, 8, 8)
    # 然后X和Y在维度1进行拼接,得到最终输出(4, 23, 8, 8)
    X = torch.randn(4, 3, 8, 8)
    Y = blk(X)
    print(Y.shape)  # (4, 23, 8, 8)


    '''
    2、过渡层 transition layer
    '''
    blk = transition_block(23, 10)
    print(blk(Y).shape)  # torch.Size([4, 10, 4, 4])

DenseNet

import torch.nn as nn
import torch
from _08_dense_block import DenseBlock,transition_block

class DenseNet(nn.Module):


    def __init__(self):
        super(DenseNet, self).__init__()
        '''
            1、DenseNet⾸先使⽤同ResNet⼀样的单卷积层和最⼤汇聚层。
        '''
        b1 = nn.Sequential(
                nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                nn.BatchNorm2d(64),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        '''
            2、接下来,类似于ResNet使⽤的4个残差块,DenseNet使⽤的是4个稠密块。
        与ResNet类似,我们可以设置每个稠密块使⽤多少个卷积层。这⾥我们设成4,从⽽之前的ResNet-18保持⼀致。
        稠密块⾥的卷积层通道数(即增⻓率)设为32,所以每个稠密块将增加128个通道。
        
        
            3、在每个模块之间,ResNet通过步幅为2的残差块减⼩⾼和宽,DenseNet则使⽤过渡层来减半⾼和宽,并减半通道数。
        '''
        # num_channels为当前的通道数
        num_channels, growth_rate = 64, 32
        num_convs_in_dense_blocks = [4, 4, 4, 4]
        blks = []
        for i, num_convs in enumerate(num_convs_in_dense_blocks):
            # 添加稠密块
            blks.append(DenseBlock(num_convs, num_channels, growth_rate))
            # 上⼀个稠密块的输出通道数
            num_channels += num_convs * growth_rate

            # 在稠密块之间添加⼀个转换层,使通道数量减半
            if i != len(num_convs_in_dense_blocks) - 1:
                blks.append(transition_block(num_channels, num_channels // 2))
                num_channels = num_channels // 2
        '''
        4、与ResNet类似,最后接上全局汇聚层和全连接层来输出结果。
        '''
        self.model = nn.Sequential(
            b1,
            *blks,
            nn.BatchNorm2d(num_channels),
            nn.ReLU(),
            nn.AdaptiveAvgPool2d((1, 1)),
            nn.Flatten(),
            nn.Linear(num_channels, 10)
        )




    def forward(self, X):
        return self.model(X)


if __name__ == '__main__':
    net = DenseNet()
    X = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
    for layer in net.model:
        X = layer(X)
        print(layer.__class__.__name__, 'output shape:', X.shape)
Sequential output shape: torch.Size([1, 64, 56, 56])

DenseBlock output shape: torch.Size([1, 192, 56, 56])
Sequential output shape: torch.Size([1, 96, 28, 28])
DenseBlock output shape: torch.Size([1, 224, 28, 28])
Sequential output shape: torch.Size([1, 112, 14, 14])
DenseBlock output shape: torch.Size([1, 240, 14, 14])
Sequential output shape: torch.Size([1, 120, 7, 7])
DenseBlock output shape: torch.Size([1, 248, 7, 7])

BatchNorm2d output shape: torch.Size([1, 248, 7, 7])
ReLU output shape: torch.Size([1, 248, 7, 7])
AdaptiveAvgPool2d output shape: torch.Size([1, 248, 1, 1])
Flatten output shape: torch.Size([1, 248])
Linear output shape: torch.Size([1, 10])

1.4 The problem of excessive memory or video memory consumption of DenseNet

Although DenseNethas high computational efficiency and relatively few parameters, it DenseNetis not memory-friendly. Considering GPUthe limitation of video memory size, it is impossible to train deeper ones DenseNet.

1.4.1 Memory calculation

Assume that DenseNetthe block contains L layers, then:
for the l-th layer, there is xl = H l ( [ x 0 , x 1 , . . . , xl − 1 ] ) for the l-th layer, there is ...,x_{l-1}])For layer l , there is xl=Hl([x0,x1,...,xl1])
Assuming that the output size of each layerfeature mapis W×H, the number of channels is k, andBN-ReLU-Conv(3x3)consists of , then:

  • Splicing Concatoperation: It is necessary to generate a temporary feature mapas the input of the l-th layer, and the memory consumption is W×H×k×l.
  • BNOperation: Need to generate temporary feature mapas ReLUinput of , memory consumption is W×H×k×l.
  • ReLUAction: In-place modification can be performed, so no additional feature mapstash ReLUoutput is required.
  • ConvOperation: Need to generate output feature mapAs the output of layer l, it is a necessary overhead.

feature mapTherefore, in addition to the memory overhead required for the output of layers 1, 2,...,L, layer l also requires 2W×H×k×l memory overhead to store the temporary generated in the middle feature map.
The entire DenseNet block requires W × H × k × (L + 1) L memory overhead to store the temporary feature map generated in the middle. That is, the memory consumption of the DenseNet block is O (L 2), which is the square relationship of the network depth. The entire DenseNet block requires W×H×k×(L+1)L memory overhead to store the temporary feature maps generated in the middle. \\ That is, the memory consumption of the DenseNet block is O(L^2), which is the square relationship of the network depth.The whole D n se N e t block requires W×H×k×(L+1 ) L memory overhead to store the temporary feature map generated in the middle.That is, the memory consumption of D n se N e t block is O ( L2 ), is the square relationship of network depth.

1.4.2 The necessity of splicing and the reason of memory consumption

  • The splicing Concatoperation is necessary because the convolution operation is more computationally efficient when the inputs to the convolution are stored in contiguous memory areas. In DenseNet Block, the input of layer l is spliced ​​along the channel direction feature mapfrom the output of previous layers . feature mapThese outputs feature mapare not in contiguous memory areas.

  • DenseNet BlockThis memory consumption of is not DenseNet Blockcaused by the structure of , but by the deep learning library. Because Tensorflow/PyTorchwhen the library implements the neural network, it will store the temporary nodes generated in the middle (such as BNthe output node). This is so that the value of the temporary node can be directly obtained during the backpropagation stage.

  • This is a compromise between time cost and space cost: saving computation during the backpropagation stage by opening up more space to store temporary values.

1.4.3 Network parameters also consume memory

In addition to temporary feature mapmemory consumption, network parameters also consume memory. Assuming that H BN-ReLU-Conv(3x3)consists of , the number of network parameters in layer l is: 9×l×k^2 (not considered BN).
The number of parameters of the entire DenseNet block is 9 k 2 ( L + 1 ) L 2 , that is, O ( L 2 ) The number of parameters of the entire DenseNet block is \frac{9k^2(L+1)L}{2} , that is, O(L^2)The number of parameters of the whole D n se N e t block is29k _2(L+1)L,That is O ( L2)

  • Since DenseNetthe number of parameters has a square relationship with the depth of the network, DenseNetthe network has more parameters and a larger network capacity. This is also DenseNetan important factor over other networks.
  • Usually there is WH > (9×k/2), where W and H are feature mapthe width and height of the network, and k is the growth rate of the network. Therefore, the memory consumed by network parameters is much smaller than feature mapthe memory consumed temporarily.

1.5 DenseNet memory optimization_shared memory

The idea is to exploit the compromise between time cost and space cost, but focus on sacrificing time cost in exchange for space cost.

The supporting factors behind it are: the computational cost of Concatoperations and operations is very low, but the space cost is high. BNSo this approach works DenseNetvery well in .

1.5.1 Traditional practices

The traditional DenseNet Blocklayer l. First, feature mapcopy to contiguous memory blocks, and complete the splicing operation during copying. Then perform the BN, ReLU, Convoperations in sequence.

The temporary memory of this layer feature mapneeds to consume 2W×H×k×l, and the output of this layer feature mapneeds to consume memory W×H×k.

  • In addition, some implementations (such as LuaTorch) also need to allocate memory for the gradient of the backpropagation process, as shown in the lower half of the left figure. For example: when calculating BNthe gradient of layer output, the gradient of the l-th output layer and BNthe output of layer need to be used. Storing these gradients requires additional O(lk) memory.
  • Other implementations (e.g. PyTorch,MxNet) use a shared memory area for gradients to store these gradients, thus requiring only O(k) memory.

insert image description here

1.5.2 Shared memory practices

The picture on the right shows the memory-optimized DenseNet Blocklayer l. Two sets of pre-allocated shared memory areas are used Shared memory Storage locationto store concateoperations and BNoperation output temporarily feature map.

对于第一组预分配的共享内存区:

The first set of pre-allocated shared memory areas: concatthe operating shared area. concatThe output of operations on layers 1, 2, ..., L are all written into this shared area, and the writing of layer (l+1) will overwrite the result of layer (l).

  • For the whole Dense Block, this shared area only needs to allocate W×H×k×L (maximum feature map) memory, that is, the memory consumption is O(kL) ( 对比传统DenseNet 的O(kL^2)).

  • Subsequent BNoperations read data directly from this shared area.

  • Since the writing of the (l+1)th layer will overwrite the results of the (l)th layer, the data stored here is temporary and easily lost. Therefore, the result of the operation of layer (l) needs to be recalculated in the backpropagation stage Concat.

    Because concatthe operation is very computationally efficient, this extra computation is cheap.

对于第二组预分配的共享内存区

The second set of pre-allocated shared memory areas: BNthe operating shared area. concatThe outputs of operations on layers 1, 2,...,L are all written to the shared area, and writing on layer (l+1) will overwrite the results on layer (l).

  • For Dense Blockthe entire shared area, only W×H×k×L (the largest feature map) memory needs to be allocated, that is, the memory consumption is O(kL) ( 对比传统DenseNet的O(kL^2) ).

  • Subsequent convolution operations read data directly from this shared area.

  • For the same reason as operating the shared area, the results of layer (l) operations concatalso need to be recalculated during the backpropagation stage .BN

    BNThe computational efficiency of is also very high, only need to pay about 5% extra computational cost.

Since BNoperations and concatoperations are widely used in neural networks, this method of pre-allocating shared memory areas can be widely used. They can save a lot of memory consumption while adding a small amount of computing time.

2 Application examples of DenseNet on the Fashion-MNIST data set

2.1 Create DenseNet network model

As shown in 1.3.3.

2.2 Read the Fashion-MNIST data set

batch_size = 256

# 为了使Fashion-MNIST上的训练短⼩精悍,将输⼊的⾼和宽从224降到96,简化计算
train_iter,test_iter = get_mnist_data(batch_size,resize=96)

2.3 Model training on GPU

from _08_DenseNet import DenseNet

# 初始化模型
net = DenseNet()

lr, num_epochs = 0.1, 10
train_ch(net, train_iter, test_iter, num_epochs, lr, try_gpu())

insert image description here

Guess you like

Origin blog.csdn.net/qq_44665283/article/details/131558669