[Mathematics in Artificial Intelligence] Differential Calculus of Multivariate Functions

[Mathematics in Artificial Intelligence] Differential Calculus of Multivariate Functions

Series Article Directory

[Artificial Intelligence Study Notes] Mathematics in Artificial Intelligence - Overview
[Mathematics in Artificial Intelligence] Differential Calculus of One Variable Function
[Mathematics in Artificial Intelligence] Basic Linear Algebra
[Mathematics in Artificial Intelligence] Multivariate Function Differential

Article directory

Partial derivative

image-20230612211222280

Partial derivatives can be seen as the generalization of derivatives. For multivariate functions, we fix the other independent variables as constants. If we take the derivative of one of the variables, it is a partial derivative. Take the derivative with respect to a variable!

image-20230612211549540

In terms of geometric meaning, it is to cut the original function in a certain direction, and then find the derivative, which is the partial derivative

image-20230612211758151

Sometimes we can also write more concisely

image-20230612212034022

higher order partial derivatives

If there are higher-order derivatives, we also have higher-order partial derivatives. Its situation is more complicated than that of higher-order derivatives, because it has multiple derivative variables, such as

image-20230612212429900

If it seeks high-order partial derivatives for x and y, it first seeks partial derivatives for x, and then seeks partial derivatives for y. In fact, it is the same as the high-order derivatives of unary functions. Repeated derivatives for each variable in turn, we Or take the above formula as an example

image-20230613193151165

Taking the partial derivative with respect to x, and then taking the partial derivative with respect to x is equal to 2

image-20230613193621507

An important conclusion is that higher-order derivatives have nothing to do with the order of derivatives

image-20230613193700614

gradient

The gradient descent method in machine learning and Newton's method will use this concept in many places

image-20230613193755617

The gradient can be regarded as the derivative of a one-variable function, and the generalization of multivariate functions

For a multivariate function if its independent variables have N

image-20230613193903607

Its gradient is a vector, which is such a vector composed of partial derivatives to X1 X2, etc., called the gradient

Gradient We use the symbol of inverted triangle to represent the action on f(x) to get such a vector. The T in the formula means that we often transpose it and regard it as a column vector

Jacobian matrix

Many students may not have learned this when they were studying advanced algebra, but this is also relatively easy to understand. It is a matrix composed of first-order partial derivatives. The purpose of inventing it is mainly to simplify the derivative formula and derive the derivative of the multivariate composite function. , if we use the Jacobian matrix to calculate, it will be written very concisely, which is often seen in the process of reverse derivation of our artificial neural network

image-20230613194247925

Suppose there is such a function that can map an n-dimensional x vector to a k-dimensional vector y

image-20230613194349461

Each xi and each yi are related, that is, each yi is a function mapped from xi alone

Its Jacobian matrix is ​​the partial derivative of each yi for each xi, and then the formed matrix is ​​called the Jacobian matrix

The first line is the partial derivative of y1 from X1 X2 to Xn, the second line is the partial derivative of y2 from X1 X2 to Xn, the kth line is the partial derivative of yk from X1 X2 to Xn,

image-20230613194533694

If xi is an n-dimensional vector and y is the result of k values, then the Jacobian is a k*n matrix

image-20230613194625113

If x1, x2, x3 will be mapped to y1, y2, y1 is a function of x1, x2, x3, and y2 is also a function of x1, x2, x3, then how is its Jacobian matrix formed?

image-20230613194745681

Hessian Matrix

It is for a multivariate function, which is equivalent to the second derivative of a one-variable function

How is it defined? There is an n-ary function, let's say X1, X2 up to Xn

image-20230613194900110

Its hessian matrix is ​​an n*n matrix. What are the elements in the matrix?

All its elements are composed of second-order partial derivatives. The first element is the second-order partial derivative of X1, and the second element is the partial derivative of X1X2. Because we said earlier, the high-order partial derivatives of multivariate functions and The order is irrelevant, so the hessian matrix is ​​symmetric

image-20230613195117909

The following example first looks at its first-order partial derivatives

image-20230613195157948

Then find out the hessian matrix

image-20230613195238530

The Hessian matrix is ​​closely related to the concave-convexity of the function. If the Hessian matrix is ​​positive definite, it can be said that the function f(x) is a convex function. If it is negative definite, it is a concave function. How is the matrix positive definite defined?

Extreme Value Discrimination Rule

For a one-variable function, as we said earlier, there is an extreme value where the first derivative of f(x) is equal to 0. When the second derivative of f(x) is greater than 0, it is a minimum value. When the second derivative of f(x) When it is less than 0, it is a maximum value, you can refer to the function of the square of X

Extremum Discrimination Rules for Multivariate Functions

First of all, the first derivative of f(x) is equal to 0. If this point is a stagnation point, then it may be an extreme point. How to determine whether it is a maximum value, a minimum value, or not an extreme value?

Looking at the hessian matrix, where the first derivative of f(x) is equal to 0, it is the stationary point. If the hessian matrix is ​​positive definite, the function has a minimum value at this point

If the hessian matrix is ​​negative definite, the function has a maximum value at this point. If the hessian matrix is ​​uncertain, you need to look at higher order derivatives

For any vector X≠0, there are image-20230613200017562, that is a positive definite matrix, if it is ≥, it is a semi-positive definite matrix

How to judge the matrix is ​​positive definite? Just take this formula to prove,

image-20230613200107587

But this is not easy. Sometimes we judge based on several principles:

The eigenvalues ​​of the matrix are all greater than 0 (we will talk about the eigenvalues ​​and eigenvectors of the matrix) All the order main subforms of the matrix are greater than 0 (we use less order main subforms) The matrix is ​​similar to the identity matrix. (img-
2K8dUJuE -1686658156231)]

But this is not easy. Sometimes we judge based on several principles:

The eigenvalues ​​of the matrix are all greater than 0 (we will talk about the eigenvalues ​​and eigenvectors of the matrix) All the order main subforms of the matrix are greater than 0 (we use less order main subforms) The matrix is ​​similar to the identity matrix

Guess you like

Origin blog.csdn.net/guigenyi/article/details/131195519