Linear algebra and probability theory - the basis of machine learning

First, linear algebra

Everything can be abstracted into a combination of certain features of the essence, linear algebra is specifically something abstract mathematical object, which describes the static and dynamic characteristics.

Common concept

  • Scalar (Scalar)
    a scalar a may be an integer, real or complex numbers
  • Vector (vector)
    a plurality of scalar a1, a2, ⋯, an a sequence composed of a certain order. Usually represented by a one-dimensional array, such as a voice signal
  • Matrix (Matrix)
    matrix contains a vector, a m * n matrix, can be viewed as a column vector composed of n m dimension can be seen as composed by a row vector of dimension n-m. Represented by a two-dimensional array, for example, a grayscale image
  • Tensor (tensor)
    tensor matrix is high-end, if the third-order cube every little box as a number, it is the tensor a 3 × 3 × 3 and 3 × 3 matrix is precisely this Rubik's Cube of a surface, which it is a slice tensor. Represented by three-dimensional or even higher dimensional arrays, for example, RGB image
  • Norm (NORM)
    to measure the size of a single vector, is a vector description of the nature itself, a vector mapped to a non-negative value.
  • Inner product (inner product)
    the relative position between the two vectors, i.e., the angle between the vectors. The relationship is calculated between the two vectors
  • Linear Space (linear space)
    a set of vector elements having the same dimension (which may be limited or unlimited), and defines an arithmetic sum calculation by structuring the like
  • Inner product space (inner product space)
    defined in the inner product calculation linear space
  • Orthogonal basis (orthogonal basis)
    volume in the inner space, a set of vectors are orthogonal to each. Group is orthogonal to the action of an inner product space defined latitude and longitude. Orthogonal basis ⼀ described inner product space once determined, the correspondence between the vector and the determined point along with it.
  • Orthonormal basis (orthonormal basis)
    orthogonal basis vector norm of the base unit length are 1

Linear transformation

Linear transformation vectors described as a change in reference frame or coordinate system can be expressed by a matrix;
linear space, to achieve a change in two ways:

  1. Change point
    Ax = y
    represents the vector x through the transformation matrix A as described, into a vector y
  2. Variation of the reference system
    described matrix ⼀ important parameters are the eigenvalues λ and eigenvectors x.
    For a given matrix A, which is assumed λ, as a feature vector x, then the relationship between them as follows:
    Ax of [lambda] x =
    feature and eigenvector matrix described change speed and direction.
    The change of the matrix considered represent people running, then the eigenvalues λ representative of the speed of running, direction of the feature vector x is running.

Second, probability theory

With linear algebra, probability theory also represents a way of looking at the world, the focus is uncertainty and the possibility of life.
Probability theory is beyond linear algebra, another theoretical basis of artificial intelligence, machine learning model most used methods are based on probability theory.
Due to limited practical tasks available training data, thus requiring estimate the parameters of the probability distribution, which is the core of machine learning task.

Two schools

  1. Frequency school (Frequentists)
    frequency of school that the parameter is an objective reality does not change, although unknown, but it is a fixed value. We do not know just an observer, so when computing the probability of specific events, should determine the type and parameters of the distribution, as a basis for probabilistic inference
  2. Bayesian (Bayesians)
    Bayesian Spikes considered parameter is a random value fixed prior distribution does not exist. The assumption itself depends on observations, the role of the data is constantly revised the assumptions made, the subjective knowledge of the probability of an observer closer to the objective reality.

Frequency send most concerned about is the likelihood function, and Bayeux Spirax most concerned about is the posterior distribution.

Two kinds of probabilistic estimation methods

  1. Maximum likelihood estimation method (maximum likelihood estimation)
    idea is to maximize the probability of occurrence of the training data, and so determine the unknown parameters of probability distributions, the estimated probability distributions also in line with the distribution of training the training data.
    The maximum likelihood estimation method to estimate the parameters, just use the training data
  2. Maximum a posteriori method (maximum a posteriori estimation)
    idea is based on the training data and other conditions known to maximize the likelihood of the emergence of unknown parameters, and select the most likely unknown parameter values as an estimate.
    When the maximum a posteriori method to estimate the parameters, in addition to the training data, but also the need for additional information, which is the a priori probability of Bayes'

for example

Good students and poor students fight

  1. Maximum Likelihood Estimation: teacher that must be wrong poor students, poor students love to stir up trouble because
  2. Maximum a posteriori probability: If the teacher knows the holidays (prior information) between the winners and poor students, these factors into account, it will not simply be considered a health challenge.
    Maximum Likelihood is to find a set of parameters such that the maximum probability of the observed data appears, is the maximum a posteriori probability of finding the largest set of parameters appear under the current observational data.

Two kinds of random variables

  1. Discrete random variables (discrete random variable)
    within a certain range of values can have a finite number or a number, such as the number of people born in certain areas
  2. Continuous Random (continuous random variable)
    in a certain range of values within a variable has an infinite number, the value can not list them out, for example, in some areas

Guess you like

Origin www.cnblogs.com/chenqionghe/p/12557966.html