Introduction to Machine Learning:
Feature vector
The objective function
Machine Learning Category:
Supervised learning: classification problems (such as face recognition, character recognition, speech recognition), regression
Unsupervised learning: clustering, dimensionality reduction
Reinforcement Learning: has delayed a prediction based on the current state of the state, to maximize returns, returns, such as unmanned, chess
Deep learning mathematics: calculus, linear algebra, probability theory, optimization methods
Functions of one variable calculus:
Taylor expansion of a univariate function: polynomial approximation function in place of
We must do at some point in the vicinity of the Taylor expansion
One yuan differential calculus: derivative, Taylor expansion, extreme value mutiny law.
Multi-variable calculus:
Partial derivatives: other variables as constants, one of the variables of the derivative.
Higher order partial derivatives: In general, the mixed second order partial derivative regardless of the derivative order.
Gradient: first order partial derivatives of the vector constituting the multivariate function of each variable.
Taylor multi-function expansion
Linear Algebra:
Vector: a point in n-dimensional space. Often a column vector mathematics, often in programming a row vector (row priority storage)
Vector arithmetic: addition, multiplication, subtraction, inner product, transposition, the norm of the vector (the vector maps to a non-negative real number)
Norm of a vector: Lp norm p-th power sum of the absolute value of the component to open power p. L1 norm: sum of the absolute components. L2 norm: length of the vector / mold.
Matrix: is a two-dimensional array. Inverse matrix; Eigenvalue; quadratic matrix.
Tensor: the equivalent of a multidimensional array programming language, n-order tensor.
2 is a tensor matrix, the vector is a tensor. Such as RGB color image is the amount of 3 order tensor.
Jacobian matrix: matrix of all partial derivatives of the dependent variable for all the independent variables constituting Jacques gradient than a behavior matrix of each multivariate function.
Hessian matrix: second order partial derivative matrix configuration, multivariate function is a symmetric matrix, corresponding to the second derivative functions of one variable.
Extreme discriminant Methods of Function: Hessian matrix acts as f '' (x)
If the Hessian matrix is positive definite, this function has a minimum value in a point; if negative definite Hessian matrix, the function has a maximum value at the point; uncertain if the Hessian matrix, saddle point was not extreme points.
Positive definite matrix is defined: x [T] Ax> 0
Criteria of the positive definite matrix: full eigenvalue matrix is greater than 0, the order of all master-matrix are greater than 0, the unit matrix array contract.
Matrix and vector derivation:
the x-gradient wTx = w
xTAx gradient of x = (A + AT) x
xTAx Hessian operator of x (Hessian matrix) = A + AT
Probability:
The probability of random events, random events
Conditional Probability:
p(a,b)=p(b|a)*p(a) p(a,b)=p(a|b)*p(b) =>p(b|a)*p(a)=p(a|b)*p(b)
Bayesian formula:
On both sides of the formula dividing the p (b) to give: p (a | b) = p (a) * p (b | a) / p (b) will be seen as a result of the fruit seen as b, then p ( b | a) is called the prior probability, p (a | b) is called posterior probability, Bayes' formula is based on the relationship between the prior probability and posterior probability.
Independent random events p (a, b, c) = p (a) p (b) p (c)
Random Variables:
After a variable quantization of random events, taking the value of a variable associated with each probability value.
Discrete random variables:
Only a limited number of cases value, or an infinite number of columns may be the case (all integers such as 0 to positive infinity of infinitely column, and all real number of 0 to 1 is not infinite column). Description discrete probability distribution of random variables: p (x = xi) ≥0, Σp (x = xi) = 1.
Continuous random variables:
Column value is not infinite case, i.e., a real number in a range. Description of continuous random variable is a probability density function and distribution function, the probability density function to satisfy: f (x) ≥0, ∫f (x) dx = 1; distribution function is defined as F (y) = p (x≤y ) = ∫f (x) dx