The true meaning of l1 and l2 number of Fan

It has been a long time did not understand the true meaning, add during November this knowledge.

l0 is the norm || x || 0 = xi (xi is not equal to 0) represents the number of non-0 digits, [1,2,3,4,5] non-5 is the number 0, [0,1,2 , 0,3] non-3 is the number 0

l1 is the norm || x || 1 = Σ | xi | Manhattan distance between 0 and x, [1,2,3, -2, -1] = 1 + 2 + 3 + 2 + 1 = 9, for the absolute values ​​of the digits.

is the l2 norm || x || 2 = Σ | xi | ^ 2 is the Euclidean distance between x and 0, [1, -3] = 1 + 2 ^ 2 + 2 ^ (- 3) ^ 2 = 1 + 4 + 9 = 14, for each digital square and a square root.

lp is the norm || x || p = √Σ (xi) ^ p. Reduce the complexity of the control model over-fitting. Adding a penalty term in the general loss of function.

Why l1 and l2 can reduce the over-fitting. Because the model is more complex w parameters, it is more complex models. w = [w1, w2, w3, w4, w5, ...., wn] let some 0, some not zero, that is, l0 norm

Objective function: min J (wxi, y) st | w | 0 <= C optimization problem can not be solved. | W | 1 and | w | 2 can be limited to less than the constant C

Constructor Lagrangian function L (w, α) = J (wxi, y) + α (| w | 1-C) L (w, α) = J (wxi, y) + α (| w | 2- C) = J (wxi; y) + α | w | 2-αC = minJ (wxi, y) + α | w | 2   

If it is two-dimensional, then it is necessary to minimize the loss of function, but also to simplify the penalty term later, when the norm of the time, when w1, w2 are two coordinates so that a positive slant down the square you can clearly see or w1 w2 is 0.

When the two-norm, that is, the intersection of circle and contour.

 

Guess you like

Origin www.cnblogs.com/limingqi/p/11621879.html