Easy Start machine learning - Linear Regression

Small text | number of small text data of the public tour

The last time some of the concepts of machine learning has been some loss of function and assessment model have done a brief introduction, today we begin a learning algorithm model of machine learning, it is the linear regression. In fact, we are probably in high school, we have learned, and why do I say so?

For [official]this function, I believe we are not unfamiliar to the bar! Remember how the slope k and intercept b seek it? When we know when the two data points, the function can be obtained by substituting k and b. A When the amount of data together, k, and b values must be obtained by the least squares method, the following formula:

[official]

[official]

This is the algorithm model we all learned of machine learning --- linear regression! I did not expect it, that we were long before the college had studied the model of machine learning algorithms, and give yourself a chicken it! ! ! Since we have learned, and then today to share content on a good grasp!

--------------------------------------

Linear regression analysis is to determine a quantitative relationship between two or more interdependent variables statistical analysis method, is a form of supervised learning using regression mathematical statistics, and with only one independent variable and a dependent variable linear regression, we called linear regression, that is our common [official]; and there are more independent variables and the dependent variable of a linear regression, we called multiple linear regression, that is [official]. Because the linear regression is a supervised learning, so that both the data set with the feature value has the tag value, such as [official]where [official].

Usually for convenience of writing and calculations, usually expressed as k (B, k), [official], i.e.,[official]

Because no matter how accurate our model predicted y values ​​there is always some errors and the real value of y, so the above expression is written:

[official]Wherein [official]and [official]are independent and have the same distribution.

Error [official]Why assumed independent and identically distributed and in accordance with the normal distribution of it?

For example: small text requires a daily walk after dinner for some time, it can be estimated based on the distance walking walking time. Because different walking speeds every day there, so even if a fixed time every day walking, walking distances will be different every day. According to my data if a line fitted to estimate walking distance, the difference between the predicted value and the actual value is a floating value, but little change in the probability of large, which is in line with the distribution of positive too. why? Because my walking speed although it can not guarantee the same every day, but according to personal habits of walking, walking speed difference does not change a lot, I always can not kick 20cm today, tomorrow, 100cm, right? Big easy steps pulled eggs! ! So we can say that it is in line with the normal distribution. Also today, some of my more slowly with me go faster tomorrow, some of which are totally okay, I can not see the situation, that it is independent. There is why it is distributed with it? Because it's my data, is based on my personal point of view, rather than someone else's, you can not use someone else's data to estimate my distance, right? After all, not everyone has big legs of my double! !

ok! Understand why errors are independent and identically distributed after the normal distribution, we can write this expression:

[official]
[official]

That known x values and the slope k, you can find the true value and predicted y value close to the probability of how much, but often we do not know the slope k. That demand has now become a conditional probability [official], if the greater the probability value, computed y value closer to the true value. In order to obtain the maximum value of the slope probability of k, we think the likelihood estimation, so the above expression can be written as:

[official]
[official]
[official]

[official]

The first is constant, the required [official]maximum demand is [official]the minimum, it is equivalent to the required [official]minimum, i.e. seeking minimum square error.

Much easier to understand, it is to ensure that the model predicted data as close to the real data, and how close is not used to measure the error of it? At this time, whether we think of the performance metrics of a return to the task of [official]it?

Now since we know the problem to be solved is [official]the minimum, it should be how to calculate it? There are two methods, namely gradient descent method and the least squares method.

----------------------------------------

最小二乘法:

[official]

[official] (用矩阵的方式表示,矩阵运算效率更高)

[official]

[official]

J(k)对k求导:

[official]

[official]
[official]

最小二乘法有个问题,当自变量间存在高度共线性时,方阵 [official] 是不可逆,为了避免这个问题,就有了另一种求解参数k的方法,也是在其他算法中用得比较多的算法--梯度下降法。

---------------------------------------

梯度下降法:

梯度下降的思想是通过迭代,参数k每次以最大梯度方向下降一次,直至参数不再变化, 得出最佳的参数k,而寻找最大梯度方向就是寻找目标函数的最小值。

[official]
[official]

[official]
[official]

其中 [official]表示步长,也称为学习率。学习率如果太小,迭代次数需要很大,运行效率大大地降低;学习率太大,迭代次数减小,但是可能会直接跨过极小值点,导致不收敛。

因为梯度下降法每次都会把所有的数据进行迭代,因此效率会很低,于是乎出现了优化算法--随机梯度下降,即每次随机取一条数据进行梯度下降,虽然迭代的次数多了,但是学习效率大大的提升,不过得出的值有可能是局部最优解,而不是全局最优解。

---------------------------------------

至此线性回归模型基本构建完毕,那么构建的模型性能怎么样,参数是否合理,该怎么判断呢?可以通过统计学上的假设检验, F检验判断模型的好坏、T检验判断参数是否合理和决定系数 [official] 判断自变量对因变量的解释度。若都通过了检验,则我们的模型才是真正的完成。那么为什么能用F检验、T检验判断和 [official] 呢?

百度百科:

卡方检验(F检验)就是统计样本的实际观测值与理论推断值之间的偏离程度,实际观测值与理论推断值之间的偏离程度就决定卡方值的大小,如果卡方值越大,二者偏差程度越大;反之,二者偏差越小;若两个值完全相等时,卡方值就为0,表明理论值完全符合。

T test is mainly used for small sample size, the overall standard deviation σ of the normal distribution is unknown. T t-test is the probability distribution theory to infer differences occur, and thus whether the comparison between two average significantly.

I believe to see here why the F-test model can be good or bad you do not need to say, is still unclear friends please pull up look at what the derivation! ! Why is it that with T-test? Because our objective function, and is in line with the square error [official]is too distributed and the variance is unknown, it is not with the definition of T-test like? ! Yes, that we can use T-test parameters to detect whether we are all 0 (all zeros mean it may be zero), if all zeros, it means each independent variable on the dependent variables are unimportant. [official]The model represents all arguments joint impact on the dependent variable, which is the independent variable on the dependent variable degree of interpretation.

end

Small data text brigade

The upper right corner stamp "+ concern" for the latest share

If you like it, please share or thumbs

Published 33 original articles · won praise 30 · views 30000 +

Guess you like

Origin blog.csdn.net/d345389812/article/details/93207078