Machine Learning Mathematical Fundamentals - Linear Algebra

 

 

foreword

AI (artificial intelligence) is in a mess now. In fact, in the field of AI, machine learning has been widely used in search engines, natural language processing, computer vision, biometric recognition, medical diagnosis, stock market analysis and other fields, and machine learning has become a The infrastructure of big Internet companies is no longer a fresh technology. But when you really start to learn machine learning, you will find that the threshold for getting started is actually quite high, mainly because machine learning is a multi-disciplinary interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, Algorithmic complexity theory and many other disciplines.

This article mainly introduces some of the most commonly used mathematical knowledge involved in machine learning, so that everyone can remove some basic obstacles when learning machine learning.

scalar (scalar)

A scalar is a single number, usually represented by ordinary lowercase or Greek letters, such as a,\alpha, etc.

vector (vector) correlation

Definition of Vector

Arranging numbers in a column is a vector, for example:

Vectors are generally represented by bold lowercase letters or bold Greek letters, such as \textbf{x}, etc. (sometimes marked by arrows, such as \pmb{\vec x}), and their elements are denoted as x_{i}.

The vector defaults to a column vector, and the row vector needs to be represented by the transpose of the column vector, such as \bm{x^{T}}, etc.

  • Physics professional perspective: a vector is an arrow in space, and a vector is determined by its length and direction
  • Computer Professional Perspective: A vector is an ordered list of numbers
  • Mathematical perspective: A vector can be anything, as long as it makes sense to add two vectors and multiply numbers by vectors

algorithm

Vector addition and quantity multiplication definitions:

Addition between vectors of the same dimension is:

 

Quantity multiplication The multiplication of an arbitrary constant cand a vector is:

given a number c,c'and \textbf{x、y}a vector

Zhang Cheng Space

The Zhang Cheng space vis wa vector set composed of all linear combinations of vectors and , namely:

av + bw( a,bvaries within the real number range)

 

basis of vector space

A set of basis in a vector space is a set of linearly independent vectors that span the space.

A set of vectors (\ vec e_ {1}, ..., \ vec e_ {n})can .

  1. Any vector (in the current space \ vec v) can be represented in \ vec v = x_ {1} \ vec e_ {1} + ... + x_ {n} \ vec e_ {n}the form of ( x_{1},...,x_{n}for any number)
  2. and this representation is unique

the dimension of the vector space

The dimension of the space can be defined by the number of basis vectors

Dimension = number of basis vectors = number of components of coordinates

 

Linearly independent

is linearly independent if and only a=b=c=0if holds.av + bw + cu = 0v,w,u

In another way of expression, linear independence means that any one of the vectors is not in the space formed by the other vectors, that is, for aall sums b, it does u = av + bwnot hold.

Linear transformation

Two conditions for linearity: the line remains a straight line and the origin remains fixed.

A strict definition of linearity:

Linear transformations keep grid lines parallel and equidistant, and keep the origin fixed.

Linear transformation is completely determined by its effect on the basis vector of the space. In two-dimensional space, the basis vector is the isum j, because any other vector is expressed as a linear combination of basis vectors, and the coordinates are (x, y) vectors It is x times iplus y times j. After a linear transformation, the grid lines remain parallel and equally spaced. There is a wonderful inference. The result of the vector (x, y) transformation will be x times the transformation. iThe plus y times the coordinates of the transformed j.

 

Dot Product of Vectors

Dot product, also known as inner product and quantity product of vectors. As the name suggests, the result is a number. For two vectors of the same dimension, the dot product is defined as follows:

  • Dot product and order independent
  • When two vectors are perpendicular to each other, the dot product is 0
  • When two vectors are in the same direction, the dot product is positive; when they are opposite, the dot product is negative

cross product of vectors

Cross product, also known as outer product of vectors, vector product. As the name suggests, the result is a vector.

  • The cross product of vectors does not satisfy the commutative law

dual vector

Given a vector, if there is a mapping that maps the given vector to a real number, the mapping is said to be a dual vector. For example, an n-dimensional row vector (a1, a2...an) can be understood as either a row vector or a mapping that converts a given n-dimensional column vector (b1, b2...bn) ) (vector) is mapped to a real number k, k=a1b1+a2b2+...anbn, that is, the product of matrices. Then this mapping satisfies the definition of a dual vector, so the row vector (a1,a2...an) is the dual vector of the dual (b1,b2...bn).

 

matrix correlation

Definition of Matrix

A matrix is ​​a two-dimensional array in which each element is identified by two indices (rather than one), usually represented by bold uppercase letters, such as: A = \begin{equation} \left( \begin{array}{ccc} a11 & a12 & a13\\ a21 & a22 & a23\\ \end{array} \right) \end{equation}.

The value of the row and column in the matrix Ais ​​called the element of ; when the number of rows and columns of the matrix is ​​the same, it is called a square matrix.ijA(i,j)

A matrix is ​​a map, or a description of the motion of a vector.
Multiply a ndimensional vector by a matrixx to get a dimensional vector . That is, by specifying a matrix , the mapping from a vector to another vector is determined. The geometric meaning of multiplying two matrices is that two linear transformations act one after the other.m\ast nAmy=AxA

Matrix Operations

addition:

Two matrices can be added as long as they have the same shape. The addition of two matrices refers to the addition of elements at corresponding positions, eg C=A+B, where C_{i,j}=A_{i,j}+B_{i,j}.

multiplication:

The matrix product of the two matrices Aand Bis the third matrix C. In order for multiplication to be defined, the number of columns of matrix A must be equal to the number of rows of matrix B. If Athe shape of m\ast nmatrix is ​​, and Bthe is n\ast p​​, then Cthe is m\ast p​​. E.g

C=AB

Specifically, the multiplication operation is defined as:

C_{i,j}=\sum_{k}^{}{A_{i,k}B_{k,j}}

The matrix product obeys the distributive law: the A (B + C) = AB + AC
matrix product also obeys the associative law: the A(BC)=(AB)C
matrix product does not satisfy the commutative law: AB = BA the case is not always satisfied The transpose of the
matrix product has the simple form: (AB) ^ T = B ^ TA ^ T

 

Rank of the matrix

the rank of the matrix, which is the dimension of the transformed space

Kernel and range

Kernel: The set of all vectors that have been transformed into zero vectors after the transformation matrix, usually represented by Ker(A).

Value range: The set of vectors formed by all vectors in a space after transformation matrices, usually represented by R(A).

 

Dimension Theorem

For a m\times nmatrix A, we havedim Ker(A) +dim R(A) = n

where dim Xdenotes the dimension of X.

column space

AThe column space of the matrix is Of​​the set of all possible output vectors , in other words, the column space is the space spanned by all the columns of the matrix.

Therefore, the more precise definition of rank is the dimension of the column space; when the rank reaches the maximum value, it means that the rank and the number of columns are equal, that is, the full rank.

zero vector

The set of transformed vectors that fall at the origin is called the 'null space' or 'kernel' of the matrix.

  • The zero vector must be in the column space
  • For a full rank transformation, the only thing that can fall at the origin after the transformation is the zero vector itself
  • For a non-full rank matrix, it compresses the space to a lower dimension, and the transformed given vector falls on the zero vector, and the "null space" is the space formed by these vectors

determinant

The determinant of a linear transformation is a linear transformation that changes the proportion of the area.

it (M_1M_2) = it (M_1) it (M_2)

  • By checking whether the determinant of a matrix is ​​0, you can see whether the transformation represented by the matrix compresses the space into a smaller dimension
  • In three-dimensional space, the determinant can be simply regarded as the volume of this parallelepiped, and a determinant of 0 means that the entire space is compressed to something with zero volume, that is, a plane or a straight line, or a point in more extreme cases
  • The value of the determinant can be negative, indicating that the spatial orientation has changed (flipped); but the absolute value of the determinant still represents the scaling of the area area

 

 

singular matrix

matrix with zero determinant

Eigenvalues ​​and Eigenvectors

Eigen decomposition

If a vector vis Athe eigenvector of a square matrix , it must be expressed in the following form:

Av = \ lambda v

\lambdais the eigenvalue vcorresponding to the eigenvector . Eigenvalue decomposition is to decompose a matrix into the following form:

A=Q\Sigma Q^{-1}

Among them, Qis the Amatrix composed of the eigenvectors of this matrix, which \Sigmais a diagonal matrix. Each diagonal element is an eigenvalue, and the eigenvalues ​​are arranged from large to small. The eigenvectors corresponding to these eigenvalues ​​are Describe the direction of change in this matrix (ordered from major changes to minor changes). That is to say, the information of matrix A can be represented by its eigenvalues ​​and eigenvectors.

When the matrix is ​​high-dimensional, then the matrix is ​​a linear transformation in the high-dimensional space. It is conceivable that this transformation also has many transformation directions. The first N eigenvectors obtained by eigenvalue decomposition correspond to the most important N change directions of this matrix. We can approximate this matrix (transformation) by using the first N change directions.

To sum up, eigenvalue decomposition can obtain eigenvalues ​​and eigenvectors. The eigenvalues ​​indicate how important the feature is, and the eigenvectors indicate what the feature is. However, eigenvalue decomposition also has many limitations. For example, the transformed matrix must be a square matrix.

singular value decomposition

Eigenvalue decomposition is a very good method to extract matrix features, but it is only for square matrices. In the real world, most of the matrices we see are not square matrices. For example, there are N students, each Students have M subjects, so an N*M matrix formed in this way cannot be a square matrix. How can we describe the important characteristics of such an ordinary matrix? Singular value decomposition can be used to do this. Singular value decomposition is a decomposition method that can be applied to any matrix:

Decomposition form:

Assuming that A is an M*N matrix, then the resulting U is a M*M square matrix (called a left singular vector), and Σ is an M*N matrix (except that the diagonal elements are all 0, right The elements on the corners are called singular values), and VT (the transpose of V) is an N*N matrix (called the right singular vector).

 

LU decomposition

Given a matrix A, expressing A as the product of the lower triangular matrix L and the upper triangular matrix U is called LU decomposition.

transpose matrix

For matrix A, the matrix obtained by exchanging its rows and columns is called the transpose matrix of A, denoted as A^{T}.

The transpose of a matrix is ​​the mirror image of the diagonal axis. This diagonal line from the upper left to the lower right is called the main diagonal.

(A^T)_{i,j}=A_{j,i}

identity matrix

In a square matrix, if the elements on the diagonal (from upper left to lower right) are 1, the rest of the elements are 0, then the matrix is ​​called the identity matrix, denoted as I. I_{n}represents nthe order identity matrix.

The mapping represented by the identity matrix is ​​a "do nothing" mapping.

Inverse matrix

A inverse multiplied by A equals a 'do nothing' matrix.A^{-1}A = \begin{equation} \left( \begin{array}{ccc} 1 & 0\\ 0 & 1 \\ \end{array} \right) \end{equation}

  • Once the inverse of A is found, the vector equation can be solved by multiplying the inverse of A in two steps
  • The determinant is not zero, then the inverse of the matrix exists

zero matrix

A matrix in which all elements are 0 is called a zero matrix, denoted as O.

The mapping represented by the zero matrix is ​​the mapping that maps all the points to the origin.

diagonal matrix

In a square matrix, the values ​​on the diagonal (from top left to bottom right) are called diagonal elements.

A matrix whose off-diagonal elements are all zeros is called a diagonal matrix.

The mapping represented by the diagonal matrix is ​​expansion and contraction along the coordinate axis, where the diagonal elements are the magnifications of the expansion and contraction of each coordinate axis.

Tensor (tensor)

In some cases, we will discuss arrays with coordinates in more than two dimensions. In general, the elements of an array are distributed in a regular network of several-dimensional coordinates, which we call a tensor.

First-order tensors can be represented by vectors, and second-order tensors can be represented by matrices.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325252767&siteId=291194637