[Mathematics in Artificial Intelligence] Differential Calculus of Multivariate Functions
Series Article Directory
[Artificial Intelligence Study Notes] Mathematics in Artificial Intelligence - Overview
[Mathematics in Artificial Intelligence] Differential Calculus of One Variable Function
[Mathematics in Artificial Intelligence] Basic Linear Algebra
[Mathematics in Artificial Intelligence] Multivariate Function Differential
Article directory
Article directory
Partial derivative
Partial derivatives can be seen as the generalization of derivatives. For multivariate functions, we fix the other independent variables as constants. If we take the derivative of one of the variables, it is a partial derivative. Take the derivative with respect to a variable!
In terms of geometric meaning, it is to cut the original function in a certain direction, and then find the derivative, which is the partial derivative
Sometimes we can also write more concisely
higher order partial derivatives
If there are higher-order derivatives, we also have higher-order partial derivatives. Its situation is more complicated than that of higher-order derivatives, because it has multiple derivative variables, such as
If it seeks high-order partial derivatives for x and y, it first seeks partial derivatives for x, and then seeks partial derivatives for y. In fact, it is the same as the high-order derivatives of unary functions. Repeated derivatives for each variable in turn, we Or take the above formula as an example
Taking the partial derivative with respect to x, and then taking the partial derivative with respect to x is equal to 2
An important conclusion is that higher-order derivatives have nothing to do with the order of derivatives
gradient
The gradient descent method in machine learning and Newton's method will use this concept in many places
The gradient can be regarded as the derivative of a one-variable function, and the generalization of multivariate functions
For a multivariate function if its independent variables have N
Its gradient is a vector, which is such a vector composed of partial derivatives to X1 X2, etc., called the gradient
Gradient We use the symbol of inverted triangle to represent the action on f(x) to get such a vector. The T in the formula means that we often transpose it and regard it as a column vector
Jacobian matrix
Many students may not have learned this when they were studying advanced algebra, but this is also relatively easy to understand. It is a matrix composed of first-order partial derivatives. The purpose of inventing it is mainly to simplify the derivative formula and derive the derivative of the multivariate composite function. , if we use the Jacobian matrix to calculate, it will be written very concisely, which is often seen in the process of reverse derivation of our artificial neural network
Suppose there is such a function that can map an n-dimensional x vector to a k-dimensional vector y
Each xi and each yi are related, that is, each yi is a function mapped from xi alone
Its Jacobian matrix is the partial derivative of each yi for each xi, and then the formed matrix is called the Jacobian matrix
The first line is the partial derivative of y1 from X1 X2 to Xn, the second line is the partial derivative of y2 from X1 X2 to Xn, the kth line is the partial derivative of yk from X1 X2 to Xn,
If xi is an n-dimensional vector and y is the result of k values, then the Jacobian is a k*n matrix
If x1, x2, x3 will be mapped to y1, y2, y1 is a function of x1, x2, x3, and y2 is also a function of x1, x2, x3, then how is its Jacobian matrix formed?
Hessian Matrix
It is for a multivariate function, which is equivalent to the second derivative of a one-variable function
How is it defined? There is an n-ary function, let's say X1, X2 up to Xn
Its hessian matrix is an n*n matrix. What are the elements in the matrix?
All its elements are composed of second-order partial derivatives. The first element is the second-order partial derivative of X1, and the second element is the partial derivative of X1X2. Because we said earlier, the high-order partial derivatives of multivariate functions and The order is irrelevant, so the hessian matrix is symmetric
The following example first looks at its first-order partial derivatives
Then find out the hessian matrix
The Hessian matrix is closely related to the concave-convexity of the function. If the Hessian matrix is positive definite, it can be said that the function f(x) is a convex function. If it is negative definite, it is a concave function. How is the matrix positive definite defined?
Extreme Value Discrimination Rule
For a one-variable function, as we said earlier, there is an extreme value where the first derivative of f(x) is equal to 0. When the second derivative of f(x) is greater than 0, it is a minimum value. When the second derivative of f(x) When it is less than 0, it is a maximum value, you can refer to the function of the square of X
Extremum Discrimination Rules for Multivariate Functions
First of all, the first derivative of f(x) is equal to 0. If this point is a stagnation point, then it may be an extreme point. How to determine whether it is a maximum value, a minimum value, or not an extreme value?
Looking at the hessian matrix, where the first derivative of f(x) is equal to 0, it is the stationary point. If the hessian matrix is positive definite, the function has a minimum value at this point
If the hessian matrix is negative definite, the function has a maximum value at this point. If the hessian matrix is uncertain, you need to look at higher order derivatives
For any vector X≠0, there are , that is a positive definite matrix, if it is ≥, it is a semi-positive definite matrix
How to judge the matrix is positive definite? Just take this formula to prove,
But this is not easy. Sometimes we judge based on several principles:
The eigenvalues of the matrix are all greater than 0 (we will talk about the eigenvalues and eigenvectors of the matrix) All the order main subforms of the matrix are greater than 0 (we use less order main subforms) The matrix is similar to the identity matrix. (img-
2K8dUJuE -1686658156231)]
But this is not easy. Sometimes we judge based on several principles:
The eigenvalues of the matrix are all greater than 0 (we will talk about the eigenvalues and eigenvectors of the matrix) All the order main subforms of the matrix are greater than 0 (we use less order main subforms) The matrix is similar to the identity matrix