[Deep Learning] Semantic Segmentation - Research Ideas

note reference

1. Zhihu answer: How to go on with semantic segmentation
2. Detailed explanation of the core implementation of Swin Transformer, classic models can also be quickly tuned
4. Summary of how to find innovation points in the field of deep learning

As of 2020-5

find ideas-1

Link: https://www.zhihu.com/question/390783647/answer/1221984335

(1) Manually design the network structure -> NAS search;
(2) Fix the receptive field -> introduce spatial attention to automatically adjust the receptive field;
(3) The effect cannot be improved -> change the way of thinking and do real-time segmentation to compare the results;
(4 ) ) Supervised is too popular -> Introduce weak supervision (GAN, knowledge distillation, ...) + trick = almost the same score; (
5) DNN is too boring, integrate some traditional visual methods into end-to-end training;
(6) CNN is too monotonous, so cooperate with GCN to create some suspense;
(7) Think 2D is too low, switch to 3D point cloud segmentation;
(8) Do more CNN routines in Transformer;

Feel too lazy? Building blocks piled up: A+B, A+B+C, A+B+C+D,…

Building block summary:
A-Attention mechanism: SE ~ Non-local ~ CcNet ~ GC-Net ~ Gate ~ CBAM ~ Dual Attention ~ Spatial Attention ~ Channel Attention [as long as you can master the four major addition, multiplication, parallel, and serial rules, plus knowing a little basic matrix operation rules (such as: HW * WH = HH) and sigmoid/softmax operations, then you can generate many kinds of attention mechanisms at will]

B-convolution structure: Residual block ~ Bottle-neck block ~ Split-Attention block ~ Depthwise separable convolution ~ Recurrent convolution ~ Group convolution ~ Dilated convolution ~ Octave convolution ~ Ghost convolution ~ ... [Just replace the original convolution block and you
're done

C-Multi-scale module: ASPP ~ PPM ~ DCM ~ DenseASPP ~ FPA ~ OCNet ~ MPM...
[ Comprehend the two modules of ASPP and PPM well, make more/reduce a few branches, change the parallel connection into series or series-parallel combination, Do some weighting for each branch, and then combine some attention or replace convolution to assemble hundreds of new structures ]

D-损失函数:Focal loss ~ Dice loss ~ BCE loss ~ Wetight loss ~ Boundary loss ~ Lovász-Softmax loss ~ TopK loss ~ Hausdorff distance(HD) loss ~ Sensitivity-Specificity (SS) loss ~ Distance penalized CE loss ~ contour-aware Loss…

E-pooling structure: Max pooling ~ Average pooling ~ Random pooling ~ Strip Pooling ~ Mixed Pooling ~…

F-归一化模块:Batch Normalization ~Layer Normalization ~ Instance Normalization ~ Group Normalization ~ Switchable Normalization ~ Filter Response Normalization…

G-Learning decay strategy: StepLR ~ MultiStepLR ~ ExponentialLR ~ CosineAnnealingLR ~ ReduceLROnPlateau ~…

H-optimization algorithm: BGD ~ SGD ~ Adam ~ RMSProp ~ Lookahead ~…

I-Data enhancement: horizontal flip, vertical flip, rotation, translation, scaling, cropping, erasing, reflection transformation ~ brightness, contrast, saturation, color dithering, contrast transformation ~ sharpening, histogram equalization, Gamma enhancement, PCA whitening , Gaussian noise, GAN ~ Mixup

J-Backbone Networks: LeNet ~ ResNet ~ DenseNet ~ VGGNet ~ GoogLeNet ~ Res2Net ~ ResNeXt ~ InceptionNet ~ SqueezeNet ~ ShuffleNet ~ SENet ~ DPNet ~ MobileNet ~NasNet ~ DetNet ~ EfficientNet ~ …

The best reference for the future plan trend of ps is to refer to the semantic segmentation process of the human brain. It feels that the human brain first judges the mode and then divides it. In this way, the prior brought by the judgment result is introduced during the segmentation.

Take a quick look at the papers on several main sub-directions of semantic segmentation, such as weakly supervised semantic segmentation, domain adaptive semantic segmentation, small sample semantic segmentation, and even try semantic segmentation based on neural architecture search. When you read papers, pick some latest and high-quality papers , thoroughly understand every detail in 360 degrees without dead ends, and then reproduce ** papers (preferably reproduce open source codes), **I think As long as you are proficient enough, you will definitely have an idea. These papers are all artificially created and follow a certain routine! You should change the structure when you are very familiar with this field. Don’t change it blindly if you understand a little bit.

The benefits of semantic segmentation, first of all, the segmentation problem can be regarded as the optimization of the backbone of other problems, so the various modules made by segmentation can be directly added to other solutions (forming the so-called A+B routine method), and vice versa . Secondly, for pixel-level applications similar to semantic segmentation, replace the last layer of softmax with regression, and you will harvest a large number of other similar types of problems (similarly, you can apply related routines from A+B to C, etc.).

Innovative ideas-2

The predecessors of deep learning have high precision, how to innovate?

  1. Adding some noise to the original data set, such as random occlusion, or adjusting saturation and brightness, is mainly to add noise or disturbance according to the specific task, and should not be messed up. If its accuracy drops sharply, then your idea will come, how to ensure the accuracy of the model in the case of occlusion or noise or other conditions. (much ado about nothing)

  2. Use its model to try a data set of a new scene, because its original model is likely to be overfitting. If the accuracy drops sharply in the new scene, the idea is again, how to improve the generalization ability of the model and achieve high precision in the new scene. (much ado about nothing)

  3. Think about its problems, such as the model is too large, the inference speed is too slow, the training time is too long, the convergence speed is slow, etc. Generally speaking, there is a problem with this, and other problems are also associated. If the above problems exist, you can think about how to improve the reasoning speed, or greatly reduce the amount of parameters or calculations, or speed up the convergence speed without reducing the accuracy as much as possible. (The back wave pushes the front wave)

  4. Consider whether the model is too complex, for example: there are too many places for manual design, too many post-processing, and too many places that need to be adjusted. Based on these situations, you can consider how to design an end-to-end model. In the design process, there will definitely be situations where the training effect is not good. At this time, you need to design some new processing methods yourself. This method is your innovation. . (The back wave pushes the front wave)

  5. Replace some new structures and introduce some other technologies, such as transformer, feature pyramid technology, etc. This aspect is mainly to pay more attention to some related technologies, cutting-edge technologies, and it is recommended to pay more attention to the content of various directions. (introduce the old and bring forth the new)

  6. ==. Try to do some specific detection or identification. == General models often detect and recognize multiple classes in order to ensure generalization ability, and the recognition accuracy of each class is not very high. So you can consider only detecting or identifying a specific class. Take behavior recognition as an example. Some general-purpose models can recognize dozens of actions, but you can only do fall detection. In this case, you can add a lot of prior knowledge to the model . In other words, your model is designed specifically for falls, so the accuracy can often be higher. (Surprise win)

  7. If the innovation standard is lowered, then embedding part of the steps of the A algorithm (or a certain operation, function) inside the B algorithm should be regarded as innovation. For
    example
    , 1) adding the cross-layer identity connection to the convolutional neural network, The residual network is obtained;
    2) The non-local mean is introduced into the residual network, and a non-local neural network (non-local neural network) is obtained;
    3) The soft thresholding is introduced into the residual network, and a network suitable for strong noise is obtained. Residual shrinkage network for data.

Integrating several losses designed by predecessors and then the effect has been improved, although this is considered innovative

In my opinion. Generally, it is some application of deep learning in your field, generally it is a cnn framework such as a residual network, or lstm.gru or a combination of them. Or add an attention mechanism, some structural improvements, etc. Generally, when writing, the structure frame picture should be more beautiful, and more pictures and tables are given, such as confusion matrix, loss curve, etc. If you can explain the effect of using deep learning through visual means, you can generally improve it a little bit.

  1. Later, I found out that everyone was using trick. Without cosine attenuation, without mixup, without training for 200 epochs, the SOTA effect that everyone is talking about is basically not achieved, but no one will tell you about these, they will only say how much I have trained on resnet50 the accuracy rate.

Now there are really a lot of parameters in the test

Usually a random seed is set. I got a 2019 paper, the seed is 2019, I changed it to 1314, and the f1 increased by 1.5. . . I was stunned myself. . . Sota directly [cover face]

A very common phenomenon in deep learning is that A proposes a methodA, resultA = methodA + trickA. Then B proposed a methodB, resultB = methodB + trickB. B in their article will declare resultB ~> resultA, thus drawing the conclusion that methodB ~> methodA. But in fact, it is very likely that it is just trickB ~> trickA, or that trickB improves methodB more than trickA improves methodA. If there is a "fair" comparison, it is very likely that the following situations will occur: methodA ~> methodB (without adding tricks); methodA + trickA ~> methodB + trickA (this does not necessarily mean that methodA ~> methodB, it may just trickA is more suitable for methodA); methodA + trickB ~> methodB + trickB (this situation may be the last thing B wants to see). In order to show the solidity of methodB, B generally does not mention all the tricks they use in the article.

The easiest thing is to do cross application, the performance of algorithm A on task B, etc. If you can graduate by posting a meeting, quite a few of the c-type meetings of ccf have an acceptance rate of >40% (ICANN, IJCNN, ICPR, etc.), you can change the network structure a little bit to try to solve a problem that no one cares about, as long as Don't write too much and you will graduate with a master's degree.
The easiest way to post an article after graduation is probably the application, and you can pass it smoothly after posting a meeting.

  1. For a rookie in scientific research, reading the literature requires extensive reading at the beginning.
    The first step is to know what problems exist in what field, and what problems in what field you want to solve. Therefore, read all kinds of papers and fields of interest, and it is recommended to read more reviews.

The second step is to determine the solution you want to solve the problem. Many different algorithms may be used to solve this problem. Find an interesting and popular method, which is the research hotspot of the method in recent years. The advantage of this is that there are many papers to refer to, and the research is valuable. At this stage, most people will read the papers in a cloud, and they don't know what they are writing. Don't panic, skip the experimental part, and just look at what basic methods he used and what the significance is.

The third step is that the method has been determined at this time, to specifically study the principle of this method, and systematically study its method steps.
After you feel that you have mastered it, you can download some journals of school journals that are not close to you on HowNet. Most of these articles do not improve the method, and most of them are applied. This kind of article is too friendly to the beginners of thesis. At this time, you will find that, You understood the first article, from beginning to end.Read a few more articles that basically know this method, a step of writing a thesis, and the result process of the experiment.

The fourth step is to get in touch with high-quality articles. Most domestic papers are method improvement and application innovation, and method improvement can publish high-quality papers. Foreign articles have been improved very well. Our teacher said that it is impossible to completely create a new method. This is what scientists like mathematicians do. If we can improve the effect better, it is very good. Therefore, get in touch with some high-quality Chinese, have the ability to read more foreign languages, and then get familiar with their improvements and conduct inductive analysis. Some papers will combine multiple methods, and there are too many forms and combinations of improvements.

Finally, I will tell you a routine. Many friends don’t know how to find the improvement and innovation of their thesis.You can combine two or more of these improvement methods, and you can find out many points by combining different improvement methods., Of course, finding a point may not necessarily be effective, this has to be verified and optimized through later experiments.

The above are some targeted ideas. The most original approach should be to write a review after reading the more important papers in the direction. Some problems are often found during the writing process. It is not necessarily to compare the accuracy with the sota model. It is to solve the problems that still exist in this direction.

For example, achieving lightweight, improving reasoning speed, realizing real-time detection, and designing end-to-end models mentioned above are all problems in this direction. In addition, there are some other problems that can only be analyzed according to specific tasks.

If you still have no ideas after writing the review, firstly, it is recommended to try the above ideas, and secondly, it is suggested to find some classic papers related to your direction and read them. These four words are the most important.

Article core

The core of an article I summarized is:
What is the significance of your work? -> What challenges still exist in this work -> How did the existing methods solve it before? What other flaws do they have? -> How does your approach differ from others? Why can you solve it better than others? Where is the innovation? "The story here needs to be told well" -> The experimental chart is blowing...

accomplish

Link: https://www.zhihu.com/question/390783647/answer/2359428992

Semantic Segmentation Implementation:
The first method is a sliding window, where we decompose the input image into many small local images, but this method can be computationally expensive. So, we don't really use this method in practice.
Another approach is a
fully convolutional network
, where the network has a whole stack of convolutional layers and no fully connected layers, thereby preserving the spatial size of the input, which is also extremely computationally expensive.
The third and best one is to upsample and downsample the image . So we don't need to do all the convolutions at the full spatial resolution of the image, we might iterate through a small number of convolutional layers at the original resolution, then downsample that feature map, and then upsample it.

Application explanation of the combination of swin and downstream tasks

Strong push! ! ! ! —Explain the core implementation of Swin Transformer in detail, and the classic model can also be quickly tuned

Will the Swin Transformer core

The value of making SwinT modules

As shown in the figure below, the core module of Swin Transformer is the yellow part. We need to make this part into a general SwinT interface, so that more developers familiar with CNN can apply Swin Transformer to different tasks in the CV field.
insert image description here
The value of doing this is twofold:

1. Swin Transformer has powerful capabilities, and this interface will not be outdated.
① The supercomputing unit required to realize the global attention calculation of the entire super-sized image will not appear in a short time (it is also difficult for individual developers to have this kind of computing power), that is to say, the window attention can still be used continuously for a period of time. to two years;
②Now it is generally believed that the simple and effective one is the best, and the implementation of Swin Transformer is very simple, it is easy for people to understand and remember its working principle; ③In practice,
Swin Transformer has also obtained SOTA , and successfully won the Marr Award, the combination of simplicity and strength is the reason for winning the Marr Award.

2. Realize convenient and quick programming,For example, if we want to change Unet into Swin-Unet, we only need to directly replace the Conv2D module with the SwinT module. We usually need to use not only the blocks in Swin Transformer, but also the Conv2D module in the same network (for example, Swin Transformer is used to extract global features in the upper layer, and Conv2D is used to extract local features in the lower layer), so we need to use the original Swin Transformer The model undergoes architectural changes.

Application Scenarios of SwinT

1. Use the SwinT module to build a complete Swin Transformer model to reproduce the paper.

2. You can replace the existing Conv2D model with SwinT to build a network with better performance, such as Swin-Unet, and where you need to superimpose many layers of CNN to extract deep features in various scenarios, you can combine several Conv2D layers are replaced by a SwinT.

3、Since the input and output of SwinT are exactly the same as Conv2D, it can also be used in complex tasks such as semantic segmentation and target detection.

4、SwinT and Conv2D can be used at the same time for model building, and SwinT is used when high-level global features need to be extracted, and Conv2D is used when local information is needed, which is very flexible.

Summarize

We made the core module of Swin Transformer into a SwinT interface, which is similar to Conv2D.
First of all, this greatly facilitates developers to write network models, especially when customizing the model architecture, and mixing Conv2D and SwinT; then, we think that the content of the SwinT interface is very
simple and efficient, so this interface is short-term The content will not be outdated and can have a timeliness guarantee;
finally, we actually tested the interface, which proved the ease of use and precision performance of the interface.

swin and semantic segmentation

Swin-Transformer for Semantic Segmentation Algorithm Sharing

Swin, which continues to evolve on the basis of ViT, has just obtained the best paper of ICCV2021. According to the actual use experience, the effect is indeed good. From the perspective of semantic segmentation, Swin not only achieved the sota effect in ADE20K, but also in various other The scene data sets have excellent performance, and the accuracy is greatly improved compared with CNN-based segmentation algorithms such as PSPnet and deeplabv3+ (advantages: high accuracy, disadvantages: poor real-time performance, and rely heavily on pre-training models, due to tf New, there may be problems in the deployment on the embedded side, and the current reasoning framework on the embedded side is still accelerated based on conventional convolution.).

Finally, the split head is connected to supernet, based on the improvement of pspnet, the features after PPM fusion, and then fused with conv2-conv5 respectively for 4 times, the fusion method is similar to fpn, and finally a fused feature map is fused so many times

Specifically, in 4 stages. After each stage, the feature map of the output will be appended into a list
for each stage:

They are:

1x128x128x128

1x256x64x64

1x512x32x32

1x1024x16x16

insert image description here
Four groups of feature maps are fed into the branch corresponding to supernet to do the feature fusion in the figure below. You can see that it is very similar to the detected FPN. , Training: The same transform series is extremely dependent on the pre-training model, and it is basically impossible to train from scratch.
On the whole, the whole Swin is to find problems---->the form of problem solving:

Problem: SETR is too large

Solution: Using Windows-based partial attention (W-MSA)

Problem: Lack of information interaction between different Windows (Segformer in the same period directly added overlap)

Solution: Improve W-MSA and solve it by SW-MSA

Guess you like

Origin blog.csdn.net/zhe470719/article/details/124590790