Machine learning knowledge combing one

A summary of statistics

    1. To study the relationship between X and Y, the ideal is to find a mapping such that Y=f(X), but it is difficult to find in practice, that is, it is unclear what the quantitative relationship between X and Y is. , the macroscopic performance is the uncertainty of the results. Although, every time X is input, what Y is obtained is uncertain, but when the number of repetitions is enough, the possibility of a certain result can be calculated, and this kind of microscopic uncertainty is studied, but the macroscopically deterministic discipline It's statistics.

    2. The law of large numbers: frequency can be used instead of probability when the number of experiments is large enough

    3. The Central Limit Theorem: When multiple independent random factors jointly affect a result, the result generally follows a normal distribution; the normal distribution is the limit of the binomial distribution;

Suppose a sample with a sample size of n is drawn from any population with a mean value of μ and a variance of σ^2; (finite), when n is sufficiently large, the sampling distribution of the sample mean approximately obeys the mean value of μ and the variance of σ^2 /n is normally distributed.

    4. Sample (the whole is a random variable, and the sample probability will be obtained): observe n times, the result of each observation is a random variable, these n random variables are independent and identically distributed, these n random variables together are called Sample X=(X1, X2, X3...) The distribution of the sample population is the product of each distribution

    5. Maximum likelihood estimation: a type of point estimation. The basic idea is that the sample values ​​that can be obtained must have a high probability, so the parameter xita that can maximize the overall probability is the estimated value of the parameter. That is, the sample is fixed, the theta is changed, and which theta is taken out to make the entire likelihood function take the extreme value, it is considered that parameter. This value is related to the sample value. When solving, you can take the logarithm first, and then find the derivative = 0

    6. Regression Analysis: Analyze the relationship between variables. Certainty: a functional relationship Uncertainty: the variable is a random variable, using the mean as a reference point

The form is roughly estimated by the point graph, and then because the sample values ​​conform to the normal distribution, the overall joint distribution is determined, and then the maximum likelihood function is determined, and then the parameters can be calculated

    7. Evaluation of estimates (which estimator is more reliable to use?): unbiased, valid, consistent

 

2. Basic Concepts

    1. Artificial Intelligence: Make machines as smart as humans. History of artificial intelligence development: machines can reason (symbolization and logic), machines need knowledge to support reasoning (expert systems), and automatically acquire knowledge (machine learning).

    2. Machine learning: A branch of artificial intelligence that focuses on making machines learn to sum up experience (there is no way to create or "epiphany", just induction). The input is training data, and the output is the joint distribution of X and Y or the mapping relationship between X and Y. Machines can handle problems not because programmers write programs, but because machines can learn knowledge by themselves, that is, algorithms in programs are learned by machines themselves.

The development history of machine learning: symbolic system similar to decision tree (simulate human judgment on concepts), connection system similar to neural network (the main problem is to adjust parameters, the whole thinking process is a black box), statistical learning (support for vector machine kernel method), deep learning (automatically find and describe features)

Disadvantages of deep learning: lack of theory, skills in parameter tuning, great data and computing power, but easy to get started and learn

    3. Problems solved:

    1" Classification problems (results are limited possibilities) such as spam, whether the stock has gone up, is the picture a dog, a cat, or a person?

Common algorithms: logistic regression (most commonly used in industry), support vector machines, random forests, naive Bayes (commonly used in NLP), deep neural networks (used in multimedia data such as video, pictures, and voice).

    2" regression problem (there are infinite possibilities for the result) such as the housing price in Beijing after 2 months

Common algorithms: Linear regression, Ordinary Least Squares Regression, Stepwise Regression, Multivariate Adaptive Regression Splines

    3" clustering problem (finding similar data) such as user group division, also called unsupervised learning

Common algorithms: K-means, density-based clustering, LDA

    4. Three elements

     Model: joint distribution law of input and output or corresponding family of functions (with parameters to be determined)

    Strategy: Criteria for model evaluation

    Algorithm: The algorithm that determines the model according to the strategy

Loss function: difference between predicted and true values

Risk function: the expectation of the loss function, which can be replaced by empirical risk when the sample is large enough (maximum likelihood estimation)

Empirical Risk: Expectations for Training Set Errors

Common strategies: empirical risk minimization (maximum likelihood estimation),

Structural risk: (structural risk minimization SRM) add a penalty term J(f)

    5. Generalization: How accurate is it to predict new data

Overfitting: The predicted parameters are more than the real ones (too many details are not the overall rule), that is, the level of induction is too low (there must be overfitting, the key is how to reduce it)

Solution: Add a penalty term in addition to empirical risk

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326758594&siteId=291194637