About how Spatial Transformer Networks (STN) learns parameters

STN is a very basic block. Most articles on the Internet just talk about how STN performs forward propagation. I believe most people are curious about how such an unsupervised block can learn by itself.

1. Introduction to network structure

insert image description here

1.localisation net

A multi-layer CNN network, the input is a picture or feature map, and the output is θ \theta containing six parametersTheta matrix.

2.Grid generator

According to θ \thetaTheta matrix and the coordinates (system) of the original image U, to obtain the coordinates (system) of the transformed new image V

3.Sampler

According to the coordinates (system) of the new image V and the pixel values ​​of the original image U, assign pixels to the new image.

Second, the overall qualitative

Input a picture or feature map into the module, and output an affine transformed picture or feature map.

3. How to learn weights

This part belongs to the content that the big guys are too lazy to talk about, and the rookies don't understand. I belong to the latter.
First of all, you must know that every part of this module can be gradient.
Assuming that this is a set of handwritten digits, the weights are random at the beginning, that is to say θ \thetaθ is random. Then the image may be completely different after this random transformation. At this time, the loss of the original classification network will be very high. During the first learning, the loss of the classification network will decrease in the direction of the gradient. At this time, all W will be changed. , including W in the localization net, which is equivalent to updatingθ \thetaθ , so that the STN will be updated as the classification network itself is updated.

Guess you like

Origin blog.csdn.net/xiufan1/article/details/122953143