[04-00] single-variable linear regression

Series blog, the original author maintained on GitHub: https://aka.ms/beginnerAI ,
click on the star with a star do not mean more stars the author harder.

Chapter 4 SISO single layer neural network

4.0 Univariate linear regression problem

4.0.1 Ask a question

In the early construction of the Internet, operators need to address the major problem is to ensure that the temperature of the room where the server perennial maintained at around 23 degrees Celsius. In a new room, if you plan to deploy 346 servers, how do we configure the maximum air-conditioning power?

Although this problem can calculate the thermodynamic equation, but there will always be errors. Thus people tend to install a thermostat in the engine room, the air conditioner switch or to control the rotational speed of the fan or cooling capacity, wherein the maximum cooling capacity is a critical value. More advanced approach is built directly on the seabed room, temperature reduction in the air isolated manner to the cooling water circulation.

By some statistics (referred to as sample data), we get Table 4-1.

Table 4-1 sample data

Sample number The number of servers (thousand units) X Air conditioning Power (kW) Y
1 0.928 4.824
2 0.469 2.950
3 0.855 4.643
... ... ...

In the above samples, we generally referred to as the independent variable X sample characteristic value, referred to as the dependent variable Y sample value tag.

This data is two-dimensional, so we can use the visualization to show the way, the abscissa is the number of servers, the ordinate is the air conditioning power, shown in Figure 4-1.

Figure 4-1 Sample Data Visualization

By observation of the figure, we can determine whether it belongs to a linear regression, and is the most simple linear regression. Thus, we have the problem of the thermodynamic calculations converted into a statistical problem, because it can not be accurately calculated or each board in the end of each machine can produce much heat.

Shrewd reader may think of a way: the sample data, we find a very similar case with the 346, with its reference to find the appropriate value of the air-conditioning power.

Have to admit, this is completely scientific and reasonable, in fact, this is the linear regression problem-solving ideas: the use of existing values, predict unknown values. In other words, these readers inadvertently used a linear regression model. In fact, this example is very simple, there is only one independent variable and one dependent variable, so you can use simple and direct method to solve the problem. However, when there are multiple arguments, this direct approach might ineffective. Suppose there are three independent variables, most likely not be able to find and combinations of these three arguments very close to the data in the sample, then we should make use of a more systematic approach.

4.0.2 linear regression model

Regression analysis is a mathematical model. When the independent variables and the dependent variable is a linear relationship, which is a particular linear model.

The simplest case is a linear regression, one argument of a substantially linear relationship between a dependent variable and the composition of the model are:

\ [Y = a + bX + ε \ tag {1} \]

X is the independent variable, Y is the dependent variable, ε is a random error, a and b are parameters of the linear regression model, a and b are we to learn out by the algorithm.

What is the model? When the first contact with this concept, there may be some unknown sleep Li. From conventional Conceptually, it is to constitute a description of the objective things by means of physical or virtual representation of subjective consciousness, this description usually there is a certain logic or mathematical meaning of abstract expression.

For example, model cars, it will be described: a four-wheel driven by the engine sub steel. Then it is the famous Einstein's special relativity inferences Energy Concept Modeling: \ (E = mc ^ 2 \) .

Modeling the data, then, is to find ways using one or more equations to describe the generation condition data, or relationships, such as a set of data substantially satisfies \ (y = 3x + 2 \ ) of this formula, then the formula is model. Why are said to be "substantially" mean? Because in the real world, in general, noise (error) exists, it is not possible to meet this very accurate formula, as long as this line is in the vicinity of both sides, it can be counted to meet the conditions.

For linear regression model, there are some concepts need to know the following:

  • Random errors often assumed to mean 0 and variance σ ^ 2 (σ ^ 2> 0, σ ^ 2 regardless of the value of X)
  • If it is further assumed to comply with normal random error, called the normal linear model
  • Generally, if k independent variables and one dependent variable (i.e., the Y in Formula 1), due to the value of the variable is divided into two parts: the influence by an argument, it means that a function, a function of known form and unknown parameters; another portion of the other non-randomness and considerations, i.e., random errors
  • When the function is a linear function of unknown parameters, called a linear regression model
  • When the function is unknown parameter nonlinear function, called nonlinear regression analysis model
  • When the number of arguments is greater than 1 is called multivariate regression
  • When the number of the dependent variable is greater than 1 is called multiple regression

We observe the data, it can be broadly considered eligible linear regression model, so the list of Formula 1, without considering the random error, then our task is to find the right a and b, which is linear regression task.

FIG distinction 4-2 linear regression and non-linear regression

Shown in Figure 4-2, the left linear model, it can be seen a straight line passing through the centerline region formed by a set of triangles, which does not require a straight line passing through each triangle. Right side is a nonlinear model, a curve passing through the centerline region of a set of rectangles is formed. In this chapter, we first learn how to solve the problem on the left side of the linear regression.

We will use the next several ways to solve this problem:

  1. Least squares;
  2. Gradient descent method;
  3. Simple neural network method;
  4. A more general neural network algorithm.

4.0.3 form formula

Here to explain the order of linear equations w and x. In many textbooks, we can see the following formula:

\[y = w^Tx+b \tag{1}\]

or:

\[y = w \cdot x + b \tag{2}\]

And we use in this book:

\[y = x \cdot w + b \tag{3}\]

The main difference is the shape of the three sample data x is defined, correspondingly affect the definition of the shape of w. For example, if there are three feature value x, then w must have three weight values ​​corresponding to the feature value, then:

Formula 1 in the form of a matrix

x is a column vector:

\[ x= \begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \end{pmatrix} \]

w is a column vector:

\[ w= \begin{pmatrix} w_{1} \\ w_{2} \\ w_{3} \end{pmatrix} \]
\[ y=w^Tx+b= \begin{pmatrix} w_1 & w_2 & w_3 \end{pmatrix} \begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \end{pmatrix} +b \]
\[ =w_1 \cdot x_1 + w_2 \cdot x_2 + w_3 \cdot x_3 + b \tag{4} \]

w和x都是列向量,所以需要先把w转置后,再与x做矩阵乘法。

公式2的矩阵形式

公式2与公式1的区别是w的形状,在公式2中,w直接就是个行向量:

\[ w= \begin{pmatrix} w_{1} & w_{2} & w_{3} \end{pmatrix} \]

而x的形状仍然是列向量:

\[ x= \begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \end{pmatrix} \]

这样相乘之前不需要做矩阵转置了:

\[ y=wx+b= \begin{pmatrix} w_1 & w_2 & w_3 \end{pmatrix} \begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \end{pmatrix} +b \]
\[ =w_1 \cdot x_1 + w_2 \cdot x_2 + w_3 \cdot x_3 + b \tag{5} \]

公式3的矩阵形式

x是个行向量:

\[ x= \begin{pmatrix} x_{1} & x_{2} & x_{3} \end{pmatrix} \]

w是列向量:

\[ w= \begin{pmatrix} w_{1} \\ w_{2} \\ x_{3} \end{pmatrix} \]

所以x在前,w在后:

\[ y=x \cdot w+b= \begin{pmatrix} x_1 & x_2 & x_3 \end{pmatrix} \begin{pmatrix} w_{1} \\ w_{2} \\ w_{3} \end{pmatrix} +b \]
\[ =x_1 \cdot w_1 + x_2 \cdot w_2 + x_3 \cdot w_3 + b \tag{6} \]

比较公式4,5,6,其实最后的运算结果是相同的。

我们再分析一下前两种形式的x矩阵,由于x是个列向量,意味着特征由行表示,当有2个样本同时参与计算时,x需要增加一列,变成了如下形式:

\[ x= \begin{pmatrix} x_{11} & x_{21} \\ x_{12} & x_{22} \\ x_{13} & x_{23} \end{pmatrix} \]

x的第一个下标表示样本序号,第二个下标表示样本特征,所以\(x_{21}\)是第2个样本的第1个特征。看\(x_{21}\)这个序号很别扭,一般我们都是认为行在前、列在后,但是\(x_{21}\)却是处于第1行第2列,和习惯正好相反。

如果采用第三种形式,则两个样本的x的矩阵是:

\[ x= \begin{pmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \end{pmatrix} \]

第1行是第1个样本的3个特征,第2行是第2个样本的3个特征,这与常用的阅读习惯正好一致,第1个样本的第2个特征在矩阵的第1行第2列,因此我们在本书中一律使用第三种形式来描述线性方程。

另外一个原因是,在很多深度学习库的实现中,确实是把x放在w前面做矩阵运算的,同时w的形状也是从左向右看,比如左侧有2个样本的3个特征输入(2x3表示2个样本3个特征值),右侧是1个输出,则w的形状就是3x1。否则的话就需要倒着看,w的形状成为了1x3,而x变成了3x2,很别扭。

对于b来说,它永远是1行,列数与w的列数相等。比如w是3x1的矩阵,则b是1x1的矩阵。如果w是3x2的矩阵,意味着3个特征输入到2个神经元上,则b是1x2的矩阵,每个神经元分配1个bias。

Guess you like

Origin www.cnblogs.com/woodyh5/p/11975951.html