Write directory title here
1. Linear Model Construction
1. Initialize the model
import numpy as np
from utils.features import prepare_for_training#导入预处理
class LinearRegression:
def __init__(self,data,lables,olynomial_degree=0, sinusoid_degree=0, normalize_data=True):
"""
1.数据预处理操作
2,数据的特征参数
3.初始化参数矩阵
"""
data_processed,features_mean,features_deviation=prepare_for_training(data,polynomial_degree=0, sinusoid_degree=0, normalize_data=True)#预处理操作
self.data=data_processed
self.lables=lables
self.features_mean=features_mean
self.features_deviation=features_deviation
self.olynomial_degree=olynomial_degree
self.sinusoid_degree=sinusoid_degree
self.olynomial_degree=normalize_data
num_features=self.data.shape[1]
self.theta=np.zeros((num_features,1))
2. Model training function
def train(self,alpha,num_iterations=500):
"""
训练模块,执行梯度下降
:param alpha:学习率
:param num_iterations:迭代次数
:return:
"""
const_history=self.gradient_descent(alpha,num_iterations)#迭代损失
return self.theta,const_history
def gradient_descent(self,alpha,num_iterations=500):
"""
时刻i迭代模块
:param alpha:
:param num_iterations:
:return:
"""
cost_history=[]
for _ in range(num_iterations):
self.gradient_step(alpha)
cost_history.append(self.cost_function(self.data,self.lables))
return cost_history
def gradient_step(self,alpha):
"""
梯度下降参数更新方法,梯度下降,注意是矩阵运算
:return:
"""
num_examples=self.data.shape[0]
predicton=LinearnGegress.hypothesis(self.data,self.theta)
delta=predicton-self.lables
theta=self.theta
theta=theta-alpha*(1/num_examples)*(np.dot(delta.T,self.data))
self.theta=theta
@staticmethod
def hypothesis(data,theta):
"""
预测函数
:param data:
:param theta:
:return:
"""
predictions=np.dot(data,theta)#,np.dot用于计算两个数组中相应元素的乘积之和。
return predictions
def get_cost(self,data,lables):
"""
得到当前的损失
:param data:
:param lables:
:return:
"""
data_processed=prepare_for_training(data,
self.polynomial_degree,
self.sinusoid_degree,
self.normalize_data,
)[0]
return self.cost_function(data_processed,lables)
def cost_function(self,data,lables):
"""
计算损失方法
:param data:
:param lable:
:return:
"""
num_examples = data.shape[0]
delta = LinearnGegress.hypothesis(self.data, self.theta) - lables
#上一篇文章梯度损失的计算公式
cost = (1 / 2) * np.dot(delta.T, delta) / num_examples
return cost[0][0]
3. Predictive Model
def predict(self, data):
"""
用训练的参数模型,与预测得到回归值结果
"""
data_processed = prepare_for_training(data,
self.polynomial_degree,
self.sinusoid_degree,
self.normalize_data
)[0]
predictions = LinearRegression.hypothesis(data_processed, self.theta)
return predictions
Two, single characteristic variable model
Goal: Predict happiness score through GDP
1. Load the dataset
本次模型的数据是一份不同国家根据不同特征,给出不同幸福度分数的数据
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams["font.sans-serif"]=["SimHei"] #设置字体
plt.rcParams["axes.unicode_minus"]=False #该语句解决图像中的“-”负号的乱码问题
from line import LinearRegression
data=pd.read_csv('./data/world-happiness-report-2017.csv')
2. Get training and test data
#得到训练和测试数据
train_data=data.sample(frac=0.8)
text_data=data.drop(train_data.index)
input_param_name='Economy..GDP.per.Capita.'
out_param_name='Happiness.Score'
x_train=train_data[[input_param_name]].values
y_train=train_data[[out_param_name]].values
x_test=text_data[input_param_name].values
y_test=text_data[out_param_name].values
General training data and test data 7:3 or 8:2
3. Draw a scatter plot to observe the distribution of the data set
plt.scatter(x_train,y_train,label='Train data')
plt.scatter(x_test,y_test,label='test data')
plt.xlabel(input_param_name)
plt.ylabel(out_param_name)
plt.title("countr happinse")
plt.legend()
plt.show()
It is obvious that there is a linear relationship, and we train the data carefully
4. Training
We set the number of iterations to 500 and the learning rate to 0.01
num_iterations=500
learning_rate=0.01#学习率
linean_regress=LinearRegression(x_train,y_train)
(theta,const_history)=linean_regress.train(learning_rate,num_iterations)
Run to get the training loss result, and print the loss at the beginning and the end.
Compared with the beginning, the loss becomes much smaller in the end. We hope that the smaller the loss, the better.
Plotting the loss data, you will find that the loss gradually, declines, and tends to be stable
5. Prediction results
Randomly generate 100 data and get the prediction result
predictions_num=100
x_predictions=np.linspace(x_train.min(),x_train.max(),num=predictions_num).reshape(predictions_num,1)#等间隔数据,
# print(x_predictions)
y_predictions=linean_regress.predict(x_predictions)
Show forecast results with scatterplots
plt.scatter(x_train,y_train,label='Train data')
plt.scatter(x_test,y_test,label='test data')
plt.plot(x_predictions,y_predictions,'r',label="Preddiction")
plt.xlabel(input_param_name)
plt.ylabel(out_param_name)
plt.title("happinses预测")
plt.legend()
plt.show()
It can be found that the prediction results are basically on the fitting line, and there is a linear relationship between happiness and GPD. The higher the GDP, the higher the happiness score. But this is just a feature, we will use two features for training below
3. Models with Multiple Characteristic Variables
Goal: Predict happiness from GDP and degrees of freedom
Here we use plotly for drawing.
pip insyall plotly
It is a very beautiful visual drawing. If you are interested, you can learn about it.
1. Load data
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('TkAgg')
import plotly.offline
import matplotlib.pyplot as plt
import plotly.graph_objs as go
plt.rcParams["font.sans-serif"]=["SimHei"] #设置字体
plt.rcParams["axes.unicode_minus"]=False #该语句解决图像中的“-”负号的乱码问题
from line import LinearnGegress,LinearRegression
data=pd.read_csv('./data/world-happiness-report-2017.csv')
2. Get training data and test data
- Almost the same operation as above
train_data=data.sample(frac=0.8)
text_data=data.drop(train_data.index)
input_param_name1='Economy..GDP.per.Capita.'
input_param_name2="Freedom"
input_param_name3='Health..Life.Expectancy.'
out_param_name='Happiness.Score'
x_train=train_data[[input_param_name1,input_param_name2]].values
y_train=train_data[[out_param_name]].values
x_test=text_data[[input_param_name1,input_param_name2]].values
y_test=text_data[out_param_name].values
3. Draw a dynamic scatter plot
polt_traning_trace=go.Scatter3d(
x=x_train[:,0].flatten(),#这个代码段假设x_train是一个NumPy数组,它至少有两个维度,第一维度的长度大于等于1,第二维度的长度可以是任意值。具体而言,x_train[:, 0]表示取x_train的第二维中索引为0的那一列,也就是所有行的第一个元素。然后,这个一维数组被调用flatten()方法,将其展平成一个一维数组
y=x_train[:,1].flatten(),
z=y_train.flatten(),
name='traning set',
mode='markers',
marker={
'size':9,
'opacity':0.9,
'line':{
'color':'rgb(255,255,255)',
'width':1
}
}
)
polt_test_trace = go.Scatter3d(
x=x_test[:, 0].flatten(),
# 这个代码段假设x_train是一个NumPy数组,它至少有两个维度,第一维度的长度大于等于1,第二维度的长度可以是任意值。具体而言,x_train[:, 0]表示取x_train的第二维中索引为0的那一列,也就是所有行的第一个元素。然后,这个一维数组被调用flatten()方法,将其展平成一个一维数组
y=x_test[:, 1].flatten(),
z=y_test.flatten(),
name='test set',
mode='markers',
marker={
'size': 9,
'opacity': 1,
'line': {
'color': 'rgb(255,255,255)',
'width': 1
}
}
)
#布局
plot_layout=go.Layout(
title='data set',
scene={
'xaxis':{
'title':input_param_name1},
'yaxis':{
'title':input_param_name2},
'zaxis':{
'title':out_param_name}
},margin={
'l':0,'r':0,'b':0,'t':0
}
)
plot_data=[polt_traning_trace,polt_test_trace]
plot_figure=go.Figure(data=plot_data,layout=plot_layout)
plotly.offline.iplot(plot_figure)#弹出网页iplot嵌入展示
- The blue points are the training data and the red points are the test data
- Data changes can be considered from different dimensions
- Generally speaking, the higher the GPD and the degree of freedom, the higher the happiness
4. Data training
num_iterations=500
learnin_rate=0.01
liner_regress=LinearRegression(x_train,y_train)
(theta,const_history)=liner_regress.train(alpha=learnin_rate,num_iterations=num_iterations)
print('开始时候损失:',const_history[0])
print('训练后的损失',const_history[-1])
Compared with a single feature, we found that the smaller the loss value of the two feature values after training, the more reliable the prediction result is.
Draw the loss decline curve
plt.plot(range(1,num_iterations+1),const_history)
plt.xlabel('Inter')
plt.ylabel('cost')
plt.title('损失梯度')
plt.show()
5. Prediction results
For the processing of multi-dimensional data, you can use the np.hstack() function to construct two or more matrices or arrays into a new matrix or array.
We generate 100 data in the range, and construct the shape as (100, 1) The matrix, and then use np.hstack() to construct a (100,2) matrix
predictions_num=100
x_min = x_train[:, 0].min()
x_max = x_train[:, 0].max()
y_min = x_train[:, 1].min()
y_max = x_train[:, 1].max()
x_axis=np.linspace(x_min,x_max,predictions_num)
y_axis=np.linspace(y_min,y_max,predictions_num)
x_predictions = np.zeros((predictions_num * predictions_num, 1))
y_predictions = np.zeros((predictions_num * predictions_num, 1))
x_y_inex=0
for x_index,a_value in enumerate(x_axis):
for y_index,y_value in enumerate(y_axis):
x_predictions[x_y_inex]=a_value
y_predictions[x_y_inex]=y_value
x_y_inex+=1
"""
np.hstack()是NumPy库中的一个函数,用于将两个或多个数组沿着水平方向(列方向)合并成一个新的数组。
"""
z_predictions=liner_regress.predict(np.hstack((x_predictions,y_predictions)))
After obtaining the prediction results, we draw the fitted dynamic scatter plot for observation
plot_predictions_trace = go.Scatter3d(
x=x_predictions.flatten(),
y=y_predictions.flatten(),
z=z_predictions.flatten(),
name='Prediction Plane',
mode='markers',
marker={
'size': 1,
},
opacity=0.8,
surfaceaxis=2,
)
plot_data = [polt_traning_trace,polt_test_trace, plot_predictions_trace]
plot_figure = go.Figure(data=plot_data, layout=plot_layout)
plotly.offline.iplot(plot_figure)
By looking at our fitted facets, our predicted results are basically the same as the actual results
Summarize
It’s probably over here. We compare the loss values of a single eigenvector and multiple eigenvectors. The more eigenvalues, the smaller the loss value, the greater the impact on the prediction results, and the more accurate the prediction results.
If there is a problem above, please correct me
I hope you will support us a lot, study hard together, and share more novel and interesting things in the future