Probability and Statistics - expectation, variance and least squares

This article originating in individual public number: TechFlow


This article today, and we talk about expectation and variance .


Expect


We expect this concept very early contact with, in the textbook definition of Wikipedia is: it represents the result of the value of a random variable that may occur in each experiment the results of which multiplied by the probability of the sum . In other words, the expected value is a measure of the number of experiments, the average results obtained by all possible states.


We give two simple examples, the first example is crap.


We all know that a dice with six faces, namely 1,2,3,4,5,6. Every time we get the probability of throwing on each face of which is the same, it is 1/6. For rolling dice this event, it should be expected:

\[E(X) = 1 * \frac{1}{6} + 2 * \frac{1}{6} + \cdots + 6 * \frac{1}{6} = 3.5\]

That is, if we throw a lot of dice, the average result should be 3.5, but the dice on this point can not be thrown out.


Another classic example is the game of chance , the old gambler who have high and low levels, but certainly familiar with the concept of expectations. To give a simple example, such as American Roulette which has a 38 digit, each can bet a number. If the charge in, gamblers can get 35 times the bonus, if the charge is not in money after bad. We expect to count down:

\[E(X) = -1 * \frac{37}{38} + 35 * \frac{1}{38}= -\frac{3}{38}\]

We can see this expectation is a negative value , that is likely to be profitable in the short term, if we play many times, would lose.


variance


The second concept is the variance, the variance is a measure of the degree of dispersion of variables . Its formula is: \ (V (X) = E ((X - \ MU) ^ 2) \) , where \ (\ MU \) refers to a variable X is the desired value. In other words, variance refers to the variable X with its expectations squared differences of expectations, the greater the variance, discrete variable X represents more serious, is smaller, the smaller the X fluctuation range .


Because of \ ((X- \ mu) ^ 2 \) must be a non-negative value , so the variable variance must be non-negative . We also use gambling For example, suppose we now have a coin flip game. Each time a coin toss, face up if you win 10,000 yuan, back up if it is lost 9,000. It is easy to see that this game is expected to be 500 yuan. That is to say we can win each round average 500 yuan.


However, we do not count can be seen in this great game of variance. If we really play this game, a lot of high probability will win and get lost wandering between badly, difficult to stabilize earnings. It is also possible that we have not had time to win money went bankrupt.


The concept of variance, it is easy to understand why in the game, times the bet strategy is not feasible.


The so-called charge-fold strategy is, in the game a 50% win rate among current if we lost money, then the next round of the times the charge current to lose money. If you lose the bet times continue until the win so far . By this strategy it can resist the continuous risk of losing, in theory, as long as a final win, you can win back all the money before.


After we understand the concept of variance, it is easy to find this strategy is not feasible, because the variance of this strategy is very large. Prior to profit, they are easy to value a concussion can not afford, that is not enough to ruin the situation under a charge will appear.


Standard deviation


The next concept is the standard deviation, to understand the variance, standard deviation also well understood. Standard deviation is the square root of variance , and standard deviation, same as used to reflect discrete samples.


Since the definition and use of variance and standard deviation are very similar, so under normal circumstances, we use variance scene will be more. So this little introduction, we all know the concept and calculation methods can be.


Least squares method


The least squares method is very famous, and now a lot of machine learning and deep learning models are widely used. The so-called squares, in fact, the square of the mean. Also referred to as the least square method , a method used to predict the evaluation results and the actual error.


The minimum we can easily understand what the square where is it?


Refers to the square error of the square , we write the formula, it is easy to understand:

\ [SE = \ sum (y_ {pred} - y) ^ 2 \]

Here \ (y_ {pred} \) refers to the predicted values , and y refers to the sample values . We can see from the formula, in fact, all the samples squared error is squared prediction error value and the real value of and. It is the least squares method to optimize the square error , so that it is as small as possible, to find the best (y_ {pred} \) \ method.


This method is mainly used in the regression model among.


We briefly explain the concept of regression model, in the field of machine learning, the most commonly used model can be divided into regression and classification models. This difference between the two lies in the different results predicted by the model, in which the classification model to predict the results of the model is a sample category belongs . The predicted results of the regression model, the model is a specific value .


Here is a simple example, such as today I want to design a model to predict future stock is up is down, apparently stock either up or down, only two cases. So this is a classification model, but if I want to predict tomorrow's stock specific index, it is the result of a specific value, this is the regression model .


We usually use the squared error response to the predictive power of the regression model, we have by reducing errors, enhance the ability of the model to achieve more accurate results. The question is, how do we reduce errors, reduce errors Why can enhance the ability of the model it?


First, although we will predict the results of the model abbreviated as \ (y_ {pred} \) , the \ (y_ {pred} \) not fall from the sky, and behind it is the model x calculated by a number of parameters and arguments. To give a simple example, if we put a one dollar function as a regression model, the \ (Pred Y_ {B} = WX + \) . Where w is the parameter, and b.


We reduce the square error model is better to find w and B, it is calculated such that \ (y_ {pred} \) more precise, less error.


So, how do we reduce the error?


Let's take a look at the formula error sum of squares can be found, it is a quadratic function. Our high school when he had learned, quadratic function extremum, can be obtained by derivation. In addition to the derivation, there are other optimization methods, these are not the focus of this article, the article will introduce the linear regression model to share in the future.


Finally, we revisit the formula for the minimum sum of squares and variance do not know if you have any feeling. If we take the sample as a result of real expectations, then the error variance and the square and not on the same yet?


I personally think that can be understood as if the variance is a measure of the extent of the expected value of a discrete sample for the same error sum of squares reaction is to predict the result for the real value of the discrete circumstances. Natural predictions lower degree of dispersion in the real value, the better the model. So the essence of these two concepts are interlinked.


Expectations, the concept of variance Most of us are very familiar with, and the error sum of squares and unfamiliar to some of the least squares rule. We hope that through this article, you can understand the expectations and error will migrate to the sum of squared errors and the least squares method. Because knowledge is the fastest migration of certain learning path.


Today's article on here, I hope you gain something. If you liked this article, please point he pulled a concern it.

Guess you like

Origin www.cnblogs.com/techflow/p/12232364.html