inductive bias

Inductive Bias:
A prior knowledge/made-ahead assumption
CNN: There are two inductive biases

  1. Locality, because cnn is convolved on the picture bit by bit in the form of a sliding window, so he assumes that adjacent areas on the picture will have adjacent features

  2. translation equivariance translation is the same as
    f(g(x)) = g(f(x)), f can be understood as convolution, and g can be understood as translation. Because the convolution kernel in CNN is like a template, like a temple, no matter where the template moves, as long as the same input comes in and the same convolution kernel is encountered, its output will always be the same.
    Once CNN has these two inductive biases, it has a lot of prior information, so he needs relatively little data to learn a better model, but for transformer, he does not have this prior information, so he To perceive vision, you need to learn from the data yourself.

So the disadvantage of CNN is: when the training set data is relatively small, the generalization is poor

Because decoding is not involved in VIT, it will not appear that it comes from decoding when generating Q, and KV comes from the encoder

Guess you like

Origin blog.csdn.net/weixin_43845922/article/details/130923876