Deep Learning - Linear Classifier Understanding

1. We are going to implement a more powerful approach to image classification that naturally extends to neural networks and convolutional neural networks. This method mainly consists of two parts: one is the score function, which is the mapping of raw image data to category scores. The other is the loss function, which is used to quantify the agreement between the scores of the predicted classification labels and the true labels. The method can be transformed into an optimization problem in which the loss function value is minimized by updating the parameters of the scoring function.

2. Linear mapping relationship

  \displaystyle f(x_i,W,b)=Wx_i+b

  Each row of W is a classifier that classifies the class. The geometric interpretation of these numbers is that if you change the numbers in one of the rows, you will see that the line corresponding to the classifier in space starts to rotate in different directions. The deviation b allows the linear translation corresponding to the classifier. x_i=0It is important to note that if there is no bias, the classification score at time is always 0 , regardless of the weight . This way all classifier lines have to go through the origin.

  Think of a linear classifier as template matching : Another interpretation of the weight W is that each row of it corresponds to a template (sometimes called a prototype ) for the class. The scores of an image for different categories are compared by using an inner product (also called a dot product ) to compare the image and the template, and then find which template is most similar. From this point of view, the linear classifier is using the learned template to perform template matching on the image. From another point of view, it can be considered that k-NN is still used efficiently, the difference is that we do not use all the images of the training set for comparison, but only use one image per category (this image is our learned, not one of the training set), and we will use the (negative) inner product to calculate the distance between the vectors, instead of using the L1 or L2 distance.

  When predicting, note that only one matrix multiplication and one matrix addition is required to classify a test data, which is much faster than the k-NN method of comparing the test image with all the training data.

 

Reference: Know the column https://zhuanlan.zhihu.com/p/20918580?refer=intelligentunit

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326497623&siteId=291194637