ReID DAY2

This is the second day of learning, but also the thesis of this essay related in some learning points 3.1. Today something, began to study at night, every day needs to continue efforts to consolidate ah!

 

 

(1) characterization study (Representation Learning) : Learning can be regarded as characterizing feature of learning, but learning another special classification and machine learning similar features into supervised learning and unsupervised learning feature, supervised learning features such as neural networks, that you let the network through adequate training can determine what things are, to a high identification rate and degree of completion. Unsupervised feature learning means that the data has not been labeled a pile, and before the network has not been trained relevant data, but the data was used to train the network, such as a variety of clustering and deformation. A good feature that can help us learn more effectively in order to extract information data for classification or prediction. However, although the depth of the neural network can learn effectively rich data features, but difficult to interpret features. Usually the more neural network layers, the higher the cost of training.

Links: https://www.jiqizhixin.com/graph/technologies/64d4c374-6061-46cc-8d29-d0a582934876

 

(2) the Softmax activation function (exponential function normalized) :

 The following explanation softmax function.

 

First of all, we know that the probability of two properties: the probability of 1) the prediction of non-negative; odds of 2) various prediction is equal to one.

 

The softmax is to predict the result at minus infinity to plus infinity on in accordance with this two-step transition probability of

The first step: the conversion of non-negative predictors, using exp (x) exponential function

This also makes the model can predict the result of some positive number, to ensure that the probability of non-negative

 

Step: a respective data normalization, which is to make the size of each of the different results after performing the normalization process, and may be such that the probability is 1.

Normalization method: first data through a first portion of each performs a summing process, and provided for the sum, then the data after the first step divided by their sum to obtain the corresponding probabilities, so that you can these requirements ensure that the probability corresponding to the sum of 1 and friends.


For chestnut:

 

① The predicted results into a non-negative

 

y1 = exp(x1) = exp(-3) = 0.05

 

y2 = exp(x2) = exp(1.5) = 4.48

 

y3 = exp(x3) = exp(2.7) = 14.88

 

② various predictions and probabilities equal to 1

 

z1 = y1 / (y1 + y2 + y3) = 0.05 / (0.05 + 4.48 + 14.88) = 0.0026

 

z2 = y2 / (y1 + y2 + y3) = 4.48 / (0.05 + 4.48 + 14.88) = 0.2308

 

z3 = y3 / (y1 + y2 + y3) = 14.88 / (0.05 + 4.48 + 14.88) = 0.7666
 
So to summarize the key points about Softmax it is:
  <1> molecule: by an exponential function, the output is mapped to a real number from zero to positive infinity.
  <2> denominator: all results are summed, normalized.
 
When calculating the probability of pedestrian ID Softmax is used to calculate the probability of the k-th image x belongs to the pedestrian. A higher probability smaller losses, and vice versa.
In a forth behind the ID tag, the q (k) = 1, y = k changed to y = k, q (k) = 1 is easier to understand.

(3) Q: weight identification information ID which is introduced is not sufficient to enhance the ability of generalization in a pedestrian, the pedestrian time add other attributes of the training labels can improve the generalization ability. But this does not cause a problem fitting it over? If not (certainly not), then why the introduction of many properties in ReID can enhance the generalization capability, and add more and more data attributes to train in house prices predicted it will result in the consequences of fitting it? Or, in ReID also used the method dropout it? And not be anxious, look back you'll find the total value of the loss by L Att with  L ID constituted in this way to solve a problem of over-fitting. There is no point like this regularization is it? This issue needs to think QwQ.

(4) Resnet (the pride of Chinese people!) : This appears to solve the degradation problems after diffusion gradient or gradient explosion, and the addition regularization method. I do not know how to explain the residual, partial everyone with fear, so look for the two I think talking about the most detailed article at the following link: https://blog.csdn.net/lanran2/article/details/79057994

https://blog.csdn.net/mao_feng/article/details/52734438

 

(5) Q: We have noted in the verification losses in sub-network verification and classification of sub-networks have carried out a dropout, then they are what prob keep it? And why here need to use dropout, and in conjunction with ID and property loss is the use of FC network layer? The answer may emm next essay will appear.

Extracurricular things a bit more during the day, seize the night school for a 3.1, while tomorrow nothing to learn a little cutting, to share together Yeah T uT

Guess you like

Origin www.cnblogs.com/Warmchay/p/12163912.html