Article directory
Camouflaged Object Detection
SINet is available in two versions
SINet-v1 published in CVPR2020
Paper address: Camouflaged_Object_Detection_CVPR_2020
Code address: SINet-v1 code
SINet-v2 published on IEEE TPAMI 2021
Paper Address: Concealed Object Detection
Code address: SINet-V2
Compared with the v2 version, the v1 version has some changes in the network
structure
v1 network structure:
v2 network structure:
SINet v1
The main contribution of SINet is the production of the COD10K data set, and the pit of camouflaged object recognition.
SINet v1 does not have much innovation in the network structure, the structure is mainly based on the CPD framework
It is recommended to read this article before reading the v1 structure:
- Cascaded Partial Decoder for Fast and Accurate Salient Object Detection
This article is an article on salient target detection from CVPR in 2019
This paper mainly proposes a new cascaded partial decoder (CPD) framework for fast and accurate salient object detection.
The RF module, SA module, and PDC module used in SINet are all modules in the copied CPD framework
And the double-branch structure used by SINetv1 is also the double-branch structure of the copied CPD
The basic structure is the same, but the low-level features are not discarded.
You can read another blog about the CPD framework: PDC module, F module, and SA module are all explained here
https://zpf1900.blog.csdn.net/article/details/127429430
The structure of the entire network is also modeled on the structure of CPD, with double branches
Although the author divides it into two parts and named them Search Module (SM) and Identification Module (IM), it is actually the dual-branch structure of CPD.
So, naming is an art
The backbone network uses ResNet50, and the features of the five convolutional blocks are not discarded
The first branch uses the PDC to fuse the features of the five convolutional blocks through the RF module
The second branch sends the feature map of the third module to SA, and then passes through the RF module together with the feature maps of the fourth and fifth convolution blocks and then sends it to the PDC to obtain an enhanced map.
The two branches are jointly trained using the cross-entropy loss function
I have written about the specific network details in the CPD blog, so I won’t explain them here. CPD explanation
In addition, the author in the CPD article did not draw a specific network diagram for the modules he used
SINet draws two pictures
RF module:
PDC module:
SINet v2
The biggest difference between v2 and v1 is the attention part, v2 uses group inversion attention.
feature extraction
ResNet50 is still used, but unlike v1, only the features of the last three stages are required here, and the low-level features are discarded (the processing of the CPD framework is still used for reference)
Texture Enhanced Module texture enhancement module
The features extracted in the three stages all pass through a TEM. This is the RF module in v1, but the name has been changed, and the code has not changed.
Neighbor Connection Decoder Neighbor Connection Decoder
This is the PDC module in v1, just changed its name. no explanation
get C 6 C_6C6
Group-Reversal Attention group reverse attention
This group reverses the attention, the purpose is to erase the recognized objects and let the network follow up to focus on information in other areas.
This is the rough picture that is available C 6 C_6C6, negate first, and record this as yyy
Then the feature p 1 5 p^5_1 extracted by the backbone networkp15, recorded as xxx。
The whole process is to divide x into several groups by channel, then insert y into it, and then convolution and fusion.
For example: p 1 5 p^5_1p15is xxx , the input is 32 channels, a total of three GRAs are performed, and the first time is divided into a group, which is directly equal to x, 32 channels, plus a reversedC 6 C_6C6, which is yyy , get 33 channels, after a 3x3 convolution, change back to 32 channels, and then ReLU again, you get a newxxx , and y, take this newxxx , convolution, the channel is compressed to 1 dimension, which is our newyyy , we also call it the attention score.
Then our new x and y, carry out the second GRA, this time the input x is 32 channels, divided into 4 groups, that is, each group has 8 channels, and we insert a y after each group, that is, each group All become 9 channels, and then sent to the convolution together, changed back to 32 channels, recorded as the new x, similarly, the attention score obtained after compressing the channels, recorded as the new y.
Then for the third GRA, we divided it into 32 groups, that is, a group of channels, and then added a y to each channel, which is 64 channels. Similarly, the convolution changed back to 32 channels, and the compressed channels got attention scores. The final y obtained is r 4 5 r^5_4 in the figurer45. C 6 C_6C6Plus, after another upsampling to restore the size, we get our C 5 C_5C5。
C 4 C_4 C4, C 3 C_3 C3It is also the same.
The whole process is actually, C 6 C_6C6is the target we have discovered, and then put C 6 C_6 in the figureC6Eliminated, and then let the network go to search for the target again, after three rounds of search, and then put C 6 C_6C6Fill. It is equivalent to perfecting except C 6 C_6C6other details.
Then repeat this process on the feature maps obtained in the three stages of the backbone network, which is equivalent to supplementing details at each stage.
Finally, the output graph is obtained, and the entire network structure is like this.
The author of the GRA module drew a picture, as follows:
Summarize
The main contribution of this paper is to propose the systematic research task of camouflaged object detection.
The COD10K dataset was made.
SINet is proposed for detecting camouflaged objects.
SINetv1 is not very innovative, basically it is based on the network design of the following article
- Cascaded Partial Decoder for Fast and Accurate Salient Object Detection
SINetv2 changed the structure of v1, and replaced the attention module with a group inversion attention module. The author said that he was inspired by the following papers
- Pranet: Parallel reverse attention network for polyp segmentation,2020
- Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,2017
- Reverse attention for salient object detection,2018