Gradient descent learning summary

1. appreciated gradient

In the machine learning process, often using a gradient descent method to solve the loss of function of the minimum value. A point along a direction vector of each partial derivative value of the gradient function. In the opposite direction of the gradient, the fastest function decrease, easier to find the minimum value function.

2. The matrix representation of gradient descent

Is a function of the expression $ h_ \ theta (X) = X \ theta $, loss of function is the expression $ J (\ theta) = \ sum_ {i = 1} ^ m (h_ \ theta (x_i) -y_i) ^ $ 2, which is a matrix representation $ J (\ theta) = \ frac {1} {2} (X \ theta-Y) ^ T (X \ theta-Y) $, loss function for $ \ $ vector Theta the partial derivatives $ \ frac {\ partial} {\ partial \ theta} J (\ theta) = X ^ T (X \ theta-Y) $.

3. The weight calculation: All the samples used, the use of some samples

All samples considered weight

1 h = sigmoid(dataMatrix * weights)
2 error = labelMat - h
3 weight = weight + alpah * error * dataMatrix.transpose()

A sample calculation weights

1 h = sigmoid(sum(dataMatrix[i] * weights))
2 error = classLabel[i] - h
3 weight = weight + alpha * error * dataMatrix[i]

 H is an error between the value of the sigmoid function, error values ​​are calculated and the label, and then using the alpha factor errors, weight the input weights are updated.

This method does not use the calculated value of the gradient, but the use of alpha, error, the input value, to update the weights.

4. analytical gradient value Gradient and

Use theoretical derivation analytical gradient, in engineering calculations, the constant calculation using a numerical gradient.

Guess you like

Origin www.cnblogs.com/guesswhy/p/11272935.html