1. appreciated gradient
In the machine learning process, often using a gradient descent method to solve the loss of function of the minimum value. A point along a direction vector of each partial derivative value of the gradient function. In the opposite direction of the gradient, the fastest function decrease, easier to find the minimum value function.
2. The matrix representation of gradient descent
Is a function of the expression $ h_ \ theta (X) = X \ theta $, loss of function is the expression $ J (\ theta) = \ sum_ {i = 1} ^ m (h_ \ theta (x_i) -y_i) ^ $ 2, which is a matrix representation $ J (\ theta) = \ frac {1} {2} (X \ theta-Y) ^ T (X \ theta-Y) $, loss function for $ \ $ vector Theta the partial derivatives $ \ frac {\ partial} {\ partial \ theta} J (\ theta) = X ^ T (X \ theta-Y) $.
3. The weight calculation: All the samples used, the use of some samples
All samples considered weight
1 h = sigmoid(dataMatrix * weights) 2 error = labelMat - h 3 weight = weight + alpah * error * dataMatrix.transpose()
A sample calculation weights
1 h = sigmoid(sum(dataMatrix[i] * weights)) 2 error = classLabel[i] - h 3 weight = weight + alpha * error * dataMatrix[i]
H is an error between the value of the sigmoid function, error values are calculated and the label, and then using the alpha factor errors, weight the input weights are updated.
This method does not use the calculated value of the gradient, but the use of alpha, error, the input value, to update the weights.
4. analytical gradient value Gradient and
Use theoretical derivation analytical gradient, in engineering calculations, the constant calculation using a numerical gradient.