Multiple regression analysis
Linear regression analysis, I use a straight line to fit the age and wages data, the results are not too fit in. We try to use multiple equations to fit the data.
We first data read out.
import tensorflow as tf
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
unrate = pd.read_csv('SD.csv')
unrate = unrate.sort_values('Year')
print(unrate)
Year Salary
0 1.0 39451
30 1.1 40343
1 1.2 46313
31 1.3 47605
2 1.4 37839
.. ... ...
85 12.0 106247
86 12.5 117634
87 12.6 113300
88 13.3 123056
89 13.5 122537
[90 rows x 2 columns]
This time we use a quadratic equation to fit the look of these data.
We equation defined as:
\ [\ Hat (y_i) x_i ^ * = 2 + W_1 W_2 x_i + B * \]
that this is the case, we have three parameters W_1, W_2, b. We give an initial value of these three parameters.
w_1 = 1000
w_2 =1000
b = 1000
print(w_1)
print(w_2)
print(b)
y_pred = w_1* np.power(unrate['Year'],2) + w_2* unrate['Year'] + b
plt.scatter(unrate['Year'],unrate['Salary'])
plt.plot(unrate['Year'],y_pred)
plt.show()
1000
1000
1000
If we follow the above model, a predicted value is obtained \ (\ Hat {Y} \) , we need a function to evaluate the value of quality.
\ [loss = \ sum_ {i
= 0} ^ {n} (y_i - \ hat {y} _i) ^ 2 \] This function is the same as the first without any changes. Next, we need to find derivative function of this function.
\[\frac{dl}{dw_1} = \frac{dl}{d\hat{y}}*\frac{d\hat{y}}{dw_1} =-2\sum_{i=0}^{n}(y_i-\hat{y}_i)*x_i^2 \]
\[ \frac{dl}{dw_2} = \frac{dl}{d\hat{y}}*\frac{d\hat{y}}{dw_2}=-2\sum_{i=0}^{n}(y_i-\hat{y}_i)*x_i \]
\[ \frac{dl}{db}=\frac{dl}{d\hat{y}}*\frac{d\hat{y}}{db}=-2\sum_{i=0}^{n}(y_i-\hat{y}_i) \]
Let's put the above function code of
def train(w_1,w_2, b):
learning_rate = 0.000001
y_pred = w_1* np.power(unrate['Year'],2) + w_2* unrate['Year'] + b
dw_1 = -2*np.sum( np.transpose(unrate['Salary'] - y_pred)*np.power(unrate['Year'],2))
dw_2 = -2*np.sum( np.transpose(unrate['Salary'] - y_pred)*unrate['Year'])
db = -2*np.sum((unrate['Salary'] - y_pred))
temp_w_1 = w_1 - learning_rate * dw_1
temp_w_2 = w_2 - learning_rate * dw_2
temp_b = b - learning_rate * db
w_1 = temp_w_1
w_2= temp_w_2
b = temp_b
return w_1,w_2,b
We tested the effect of the lower run under:
for i in range(10000):
w_1, w_2, b = train(w_1,w_2,b)
print(w_1)
print(w_2)
print(b)
y_pred = w_1 * np.power(unrate['Year'],2) + w_2 * unrate['Year'] + b
loss = np.power((y_pred-unrate['Salary']),2).sum()
plt.scatter(unrate['Year'],unrate['Salary'])
plt.plot(unrate['Year'],y_pred)
-695.3117280326662
17380.592541992835
8744.131370136933
8487947406.30475
The above is what we fitted out the effect.
We can see a lot better than we once before fitting the data.