Multiple regression analysis and derivation

Multiple regression analysis

Linear regression analysis, I use a straight line to fit the age and wages data, the results are not too fit in. We try to use multiple equations to fit the data.


We first data read out.

import tensorflow as tf
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
unrate = pd.read_csv('SD.csv')
unrate = unrate.sort_values('Year')
print(unrate)
    Year  Salary
0    1.0   39451
30   1.1   40343
1    1.2   46313
31   1.3   47605
2    1.4   37839
..   ...     ...
85  12.0  106247
86  12.5  117634
87  12.6  113300
88  13.3  123056
89  13.5  122537

[90 rows x 2 columns]

This time we use a quadratic equation to fit the look of these data.

We equation defined as:
\ [\ Hat (y_i) x_i ^ * = 2 + W_1 W_2 x_i + B * \]
that this is the case, we have three parameters W_1, W_2, b. We give an initial value of these three parameters.

w_1 = 1000
w_2 =1000
b = 1000
print(w_1)
print(w_2)
print(b)

y_pred = w_1* np.power(unrate['Year'],2) + w_2* unrate['Year'] + b
plt.scatter(unrate['Year'],unrate['Salary'])
plt.plot(unrate['Year'],y_pred)
plt.show()
1000
1000
1000

If we follow the above model, a predicted value is obtained \ (\ Hat {Y} \) , we need a function to evaluate the value of quality.
\ [loss = \ sum_ {i
= 0} ^ {n} (y_i - \ hat {y} _i) ^ 2 \] This function is the same as the first without any changes. Next, we need to find derivative function of this function.

\[\frac{dl}{dw_1} = \frac{dl}{d\hat{y}}*\frac{d\hat{y}}{dw_1} =-2\sum_{i=0}^{n}(y_i-\hat{y}_i)*x_i^2 \]

\[ \frac{dl}{dw_2} = \frac{dl}{d\hat{y}}*\frac{d\hat{y}}{dw_2}=-2\sum_{i=0}^{n}(y_i-\hat{y}_i)*x_i \]

\[ \frac{dl}{db}=\frac{dl}{d\hat{y}}*\frac{d\hat{y}}{db}=-2\sum_{i=0}^{n}(y_i-\hat{y}_i) \]

Let's put the above function code of

def train(w_1,w_2, b):
    
    learning_rate = 0.000001
    
    y_pred = w_1* np.power(unrate['Year'],2) + w_2* unrate['Year'] + b
    
    dw_1 =  -2*np.sum( np.transpose(unrate['Salary'] - y_pred)*np.power(unrate['Year'],2))
    dw_2 = -2*np.sum( np.transpose(unrate['Salary'] - y_pred)*unrate['Year'])
    db =  -2*np.sum((unrate['Salary'] - y_pred))

    temp_w_1 = w_1 - learning_rate * dw_1
    temp_w_2 = w_2 - learning_rate * dw_2
    temp_b = b - learning_rate * db
    
    w_1 = temp_w_1
    w_2= temp_w_2
    b = temp_b
    return w_1,w_2,b
 

    

We tested the effect of the lower run under:

for i in range(10000):
    w_1, w_2, b = train(w_1,w_2,b)

    
    
print(w_1)
print(w_2)
print(b)
y_pred = w_1 * np.power(unrate['Year'],2) + w_2 * unrate['Year'] + b
loss = np.power((y_pred-unrate['Salary']),2).sum()


plt.scatter(unrate['Year'],unrate['Salary'])
plt.plot(unrate['Year'],y_pred)



-695.3117280326662
17380.592541992835
8744.131370136933
8487947406.30475

The above is what we fitted out the effect.

We can see a lot better than we once before fitting the data.

Guess you like

Origin www.cnblogs.com/bbird/p/11493266.html