foreword
AI (artificial intelligence) is in a mess now. In fact, in the field of AI, machine learning has been widely used in search engines, natural language processing, computer vision, biometric recognition, medical diagnosis, stock market analysis and other fields, and machine learning has become a The infrastructure of big Internet companies is no longer a fresh technology. But when you really start to learn machine learning, you will find that the threshold for getting started is actually quite high, mainly because machine learning is a multi-disciplinary interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, Algorithmic complexity theory and many other disciplines.
This article mainly introduces some of the most commonly used mathematical knowledge involved in machine learning, so that everyone can remove some basic obstacles when learning machine learning.
scalar (scalar)
A scalar is a single number, usually represented by ordinary lowercase or Greek letters, such as , etc.
vector (vector) correlation
Definition of Vector
Arranging numbers in a column is a vector, for example:
Vectors are generally represented by bold lowercase letters or bold Greek letters, such as , etc. (sometimes marked by arrows, such as ), and their elements are denoted as .
The vector defaults to a column vector, and the row vector needs to be represented by the transpose of the column vector, such as , etc.
- Physics professional perspective: a vector is an arrow in space, and a vector is determined by its length and direction
- Computer Professional Perspective: A vector is an ordered list of numbers
- Mathematical perspective: A vector can be anything, as long as it makes sense to add two vectors and multiply numbers by vectors
algorithm
Vector addition and quantity multiplication definitions:
Addition between vectors of the same dimension is:
Quantity multiplication The multiplication of an arbitrary constant and a vector is:
given a number and a vector
Zhang Cheng Space
The Zhang Cheng space is a vector set composed of all linear combinations of vectors and , namely:
( varies within the real number range)
basis of vector space
A set of basis in a vector space is a set of linearly independent vectors that span the space.
A set of vectors can .
- Any vector (in the current space ) can be represented in the form of ( for any number)
- and this representation is unique
the dimension of the vector space
The dimension of the space can be defined by the number of basis vectors
Dimension = number of basis vectors = number of components of coordinates
Linearly independent
is linearly independent if and only if holds.
In another way of expression, linear independence means that any one of the vectors is not in the space formed by the other vectors, that is, for all sums , it does not hold.
Linear transformation
Two conditions for linearity: the line remains a straight line and the origin remains fixed.
A strict definition of linearity:
Linear transformations keep grid lines parallel and equidistant, and keep the origin fixed.
Linear transformation is completely determined by its effect on the basis vector of the space. In two-dimensional space, the basis vector is the sum , because any other vector is expressed as a linear combination of basis vectors, and the coordinates are (x, y) vectors It is x times plus y times . After a linear transformation, the grid lines remain parallel and equally spaced. There is a wonderful inference. The result of the vector (x, y) transformation will be x times the transformation. The plus y times the coordinates of the transformed .
Dot Product of Vectors
Dot product, also known as inner product and quantity product of vectors. As the name suggests, the result is a number. For two vectors of the same dimension, the dot product is defined as follows:
- Dot product and order independent
- When two vectors are perpendicular to each other, the dot product is 0
- When two vectors are in the same direction, the dot product is positive; when they are opposite, the dot product is negative
cross product of vectors
Cross product, also known as outer product of vectors, vector product. As the name suggests, the result is a vector.
- The cross product of vectors does not satisfy the commutative law
dual vector
Given a vector, if there is a mapping that maps the given vector to a real number, the mapping is said to be a dual vector. For example, an n-dimensional row vector (a1, a2...an) can be understood as either a row vector or a mapping that converts a given n-dimensional column vector (b1, b2...bn) ) (vector) is mapped to a real number k, k=a1b1+a2b2+...anbn, that is, the product of matrices. Then this mapping satisfies the definition of a dual vector, so the row vector (a1,a2...an) is the dual vector of the dual (b1,b2...bn).
matrix correlation
Definition of Matrix
A matrix is a two-dimensional array in which each element is identified by two indices (rather than one), usually represented by bold uppercase letters, such as: .
The value of the row and column in the matrix is called the element of ; when the number of rows and columns of the matrix is the same, it is called a square matrix.
A matrix is a map, or a description of the motion of a vector.
Multiply a dimensional vector by a matrix to get a dimensional vector . That is, by specifying a matrix , the mapping from a vector to another vector is determined. The geometric meaning of multiplying two matrices is that two linear transformations act one after the other.
Matrix Operations
addition:
Two matrices can be added as long as they have the same shape. The addition of two matrices refers to the addition of elements at corresponding positions, eg , where .
multiplication:
The matrix product of the two matrices and is the third matrix . In order for multiplication to be defined, the number of columns of matrix A must be equal to the number of rows of matrix B. If the shape of matrix is , and the is , then the is . E.g
Specifically, the multiplication operation is defined as:
The matrix product obeys the distributive law: the
matrix product also obeys the associative law: the
matrix product does not satisfy the commutative law: the case is not always satisfied The transpose of the
matrix product has the simple form:
Rank of the matrix
the rank of the matrix, which is the dimension of the transformed space
Kernel and range
Kernel: The set of all vectors that have been transformed into zero vectors after the transformation matrix, usually represented by Ker(A).
Value range: The set of vectors formed by all vectors in a space after transformation matrices, usually represented by R(A).
Dimension Theorem
For a matrix , we have
where denotes the dimension of X.
column space
The column space of the matrix is the set of all possible output vectors , in other words, the column space is the space spanned by all the columns of the matrix.
Therefore, the more precise definition of rank is the dimension of the column space; when the rank reaches the maximum value, it means that the rank and the number of columns are equal, that is, the full rank.
zero vector
The set of transformed vectors that fall at the origin is called the 'null space' or 'kernel' of the matrix.
- The zero vector must be in the column space
- For a full rank transformation, the only thing that can fall at the origin after the transformation is the zero vector itself
- For a non-full rank matrix, it compresses the space to a lower dimension, and the transformed given vector falls on the zero vector, and the "null space" is the space formed by these vectors
determinant
The determinant of a linear transformation is a linear transformation that changes the proportion of the area.
- By checking whether the determinant of a matrix is 0, you can see whether the transformation represented by the matrix compresses the space into a smaller dimension
- In three-dimensional space, the determinant can be simply regarded as the volume of this parallelepiped, and a determinant of 0 means that the entire space is compressed to something with zero volume, that is, a plane or a straight line, or a point in more extreme cases
- The value of the determinant can be negative, indicating that the spatial orientation has changed (flipped); but the absolute value of the determinant still represents the scaling of the area area
singular matrix
matrix with zero determinant
Eigenvalues and Eigenvectors
Eigen decomposition
If a vector is the eigenvector of a square matrix , it must be expressed in the following form:
is the eigenvalue corresponding to the eigenvector . Eigenvalue decomposition is to decompose a matrix into the following form:
Among them, is the matrix composed of the eigenvectors of this matrix, which is a diagonal matrix. Each diagonal element is an eigenvalue, and the eigenvalues are arranged from large to small. The eigenvectors corresponding to these eigenvalues are Describe the direction of change in this matrix (ordered from major changes to minor changes). That is to say, the information of matrix A can be represented by its eigenvalues and eigenvectors.
When the matrix is high-dimensional, then the matrix is a linear transformation in the high-dimensional space. It is conceivable that this transformation also has many transformation directions. The first N eigenvectors obtained by eigenvalue decomposition correspond to the most important N change directions of this matrix. We can approximate this matrix (transformation) by using the first N change directions.
To sum up, eigenvalue decomposition can obtain eigenvalues and eigenvectors. The eigenvalues indicate how important the feature is, and the eigenvectors indicate what the feature is. However, eigenvalue decomposition also has many limitations. For example, the transformed matrix must be a square matrix.
singular value decomposition
Eigenvalue decomposition is a very good method to extract matrix features, but it is only for square matrices. In the real world, most of the matrices we see are not square matrices. For example, there are N students, each Students have M subjects, so an N*M matrix formed in this way cannot be a square matrix. How can we describe the important characteristics of such an ordinary matrix? Singular value decomposition can be used to do this. Singular value decomposition is a decomposition method that can be applied to any matrix:
Decomposition form:
Assuming that A is an M*N matrix, then the resulting U is a M*M square matrix (called a left singular vector), and Σ is an M*N matrix (except that the diagonal elements are all 0, right The elements on the corners are called singular values), and VT (the transpose of V) is an N*N matrix (called the right singular vector).
LU decomposition
Given a matrix A, expressing A as the product of the lower triangular matrix L and the upper triangular matrix U is called LU decomposition.
transpose matrix
For matrix A, the matrix obtained by exchanging its rows and columns is called the transpose matrix of A, denoted as .
The transpose of a matrix is the mirror image of the diagonal axis. This diagonal line from the upper left to the lower right is called the main diagonal.
identity matrix
In a square matrix, if the elements on the diagonal (from upper left to lower right) are 1, the rest of the elements are 0, then the matrix is called the identity matrix, denoted as . represents the order identity matrix.
The mapping represented by the identity matrix is a "do nothing" mapping.
Inverse matrix
A inverse multiplied by A equals a 'do nothing' matrix.
- Once the inverse of A is found, the vector equation can be solved by multiplying the inverse of A in two steps
- The determinant is not zero, then the inverse of the matrix exists
zero matrix
A matrix in which all elements are 0 is called a zero matrix, denoted as .
The mapping represented by the zero matrix is the mapping that maps all the points to the origin.
diagonal matrix
In a square matrix, the values on the diagonal (from top left to bottom right) are called diagonal elements.
A matrix whose off-diagonal elements are all zeros is called a diagonal matrix.
The mapping represented by the diagonal matrix is expansion and contraction along the coordinate axis, where the diagonal elements are the magnifications of the expansion and contraction of each coordinate axis.
Tensor (tensor)
In some cases, we will discuss arrays with coordinates in more than two dimensions. In general, the elements of an array are distributed in a regular network of several-dimensional coordinates, which we call a tensor.
First-order tensors can be represented by vectors, and second-order tensors can be represented by matrices.