Deep learning (PyTorch) - usage of the flatten function and its difference from the reshape function

The Flatten layer is used to "flatten" the input, that is, to make the multi-dimensional input one-dimensional, and is often used in the transition from the convolutional layer to the fully connected layer . Flatten does not affect the batch size.

It is to stretch the high-latitude array according to the x-axis or y-axis to become a one-dimensional array

In order to better understand the role of the Flatten layer, I visualize this neural network as shown below: (from the network)

flatten(), the default default parameter is 0, that is to say, flatten() has the same effect as flatte(0).

Flatten(dim) in python means to expand from the dimension of dim, and convert the following dimensions into one dimension. That is to say, only the dimension before dim is kept, and the data of other dimensions are all squeezed into the dimension of dim.

For example, the dimension of a data is (S_{0},S_{1},S_{2},S_{3},...,S_{n}), the data after flatten(m) is(S_{0},S_{1},S_{2},S_{3},...,S_{m-2},S_{m-1},S_{m},S_{m+1},S_{m+2},...,S_{n})

 The case procedure is as follows:

import torch
import torchvision
from torch import nn
from torch.nn import Linear
from torch.utils.data import DataLoader

dataset = torchvision.datasets.CIFAR10("./data_CIFAR10", train=False,
                                       transform=torchvision.transforms.ToTensor(),download=True)

dataloader = DataLoader(dataset,batch_size=64)

class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()
        self.linear1 = Linear(196608,10)

    def forward(self,input):
        output = self.linear1(input)
        return output

tudui = Tudui()

for data in dataloader:
    imgs, targets = data
    print(imgs.shape)
    # output = torch.reshape(imgs,(1,1,1,-1))
    output = torch.flatten(imgs)
    print(output.shape)
    output = tudui(output)
    print(output.shape)

The result of the operation is as follows:

As can be seen from the above figure, torch_size([64,3,32,32]) is the result printed by print(imgs.shape), indicating batch_size=64, channel=3, height H=32, width W=32

The above result obtained by flattening the dimension size of the result is torch_size([196608]), where 196608=64*32*32*3 is obtained

Then the dimension size of the result obtained through the neural network (Tudui) is torch_size([10]), indicating that the output is 10 categories.

What will happen if you change flatten to reshape? The procedure is as follows:

import torch
import torchvision
from torch import nn
from torch.nn import Linear
from torch.utils.data import DataLoader

dataset = torchvision.datasets.CIFAR10("./data_CIFAR10", train=False,
                                       transform=torchvision.transforms.ToTensor(),download=True)

dataloader = DataLoader(dataset,batch_size=64)

class Tudui(nn.Module):
    def __init__(self):
        super(Tudui, self).__init__()
        self.linear1 = Linear(196608,10)

    def forward(self,input):
        output = self.linear1(input)
        return output

tudui = Tudui()

for data in dataloader:
    imgs, targets = data
    print(imgs.shape)
    output = torch.reshape(imgs,(1,1,1,-1))
    # output = torch.flatten(imgs)
    print(output.shape)
    output = tudui(output)
    print(output.shape)

The result of the operation is as follows:

We found that after reshape, the resulting dimension is torch_size([1,1,1,196608]), the result indicates batch_size=1, channel=1, height H=1, width W=196608

After passing the above result through the neural network (Tudui), the resulting size dimension is torch_size([1,1,1,10]), indicating that the output is 10 categories.

Guess you like

Origin blog.csdn.net/qq_42233059/article/details/126663501