[Semantic Segmentation Series] A picture to explain the implementation process of FCN16s. Contrast the explanation of pytorch code.

     I am afraid of forgetting my own study notes. Haha I feel very comfortable for novices to combine theory and code together. 

    Read the code and theory of fcn. Here we take FCN16s as an example to explain, 16s is clear, 8s and 32s are just subtracted on this basis, and just added.

The code I see is a VGG16 backbone network. The explanation of the key steps is clearly marked in the figure below, and it can be seen in comparison with other backbone network codes.

To put it plainly, FCN16s deconvolves the prediction result of the 1/32 picture (up-sampling bilinear interpolation is enough) to turn it into a 1/16 picture. Then predict the result of the 1/16 graph pooling layer and compare it with the previous 1/32 graph

The result is added. The final result is the prediction result of the enhanced version of the 1/16 image, which is then deconvolved to obtain the size of the original image to obtain the final result.

 

This is a process to rationalize this figure only then look at the code clear. So, to understand what algorithm, the most important thing is to look at the code, look at the code, look at the code.

Let's post the code together to see it.

Post a piece of fcn16S code.

class FCN16(nn.Module):

    def __init__(self, num_classes):
        super().__init__()

        feats = list(models.vgg16(pretrained=True).features.children())
        self.feats = nn.Sequential(*feats[0:16])#卷积加池化和relu进行了三次 得到的是1 /8的池化图
        self.feat4 = nn.Sequential(*feats[17:23])#卷积好几次池化一次 变成1/16的池化图
        self.feat5 = nn.Sequential(*feats[24:30])#卷积好几次池化一次变成1/32的池化图
        self.fconn = nn.Sequential(
            nn.Conv2d(512, 4096, 7),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Conv2d(4096, 4096, 1),
            nn.ReLU(inplace=True),
            nn.Dropout(),
        )
        self.score_fconn = nn.Conv2d(4096, num_classes, 1)
        self.score_feat4 = nn.Conv2d(512, num_classes, 1)

    def forward(self, x):#前向传播过程
        feats = self.feats(x)#1/8池化图  上面图左往右数第3个绿色块     感觉这块没啥用 因为直接用的1/16  
        feat4 = self.feat4(feats)#1/16池化图   左往右数第4个绿色块
        feat5 = self.feat5(feat4)#1/32池化图   左往右数第5个绿色块
        fconn = self.fconn(feat5)#对应最后两次卷积  最后一个绿色块左侧的两个蓝色块

        score_feat4 = self.score_feat4(feat4)#将1/16池化结果做卷积      卷积核个数为分类数
        score_fconn = self.score_fconn(fconn)#将最后的卷积结果再做次卷积      卷积核个数为分类数

        score = F.upsample_bilinear(score_fconn, score_feat4.size()[2:])# 1/32图的反卷积
        score += score_feat4#上述结果与1/16图加法 对应图 黄色的

        return F.upsample_bilinear(score, x.size()[2:])#最后一次反卷积 将其放缩到原图大小

Here is the structure diagram of the vgg16 network

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)#缩放一次
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

 

 

Guess you like

Origin blog.csdn.net/gbz3300255/article/details/105559912