multiple linear regression
Multiple linear regression is an extended form of linear regression that involves the relationship between multiple independent variables (features) and a dependent variable. The mathematical expression of the multiple linear regression model is as follows:
Explanation of each parameter:
Case
The advertising.csv file is the advertising promotion cost (unit: yuan) and sales data (unit: thousand yuan) of a certain product. Each row represents the
advertising promotion cost per week (including Taobao, Douyin and Xiaohongshu ads). expenses) and sales.
If the various advertising amounts are allocated as follows in the next two weeks, please predict the corresponding product sales:
(1) Taobao: 200, Douyin: 100, Xiaohongshu: 150
(2) Taobao: 300, Douyin: 150, Xiaohongshu: 200
advertising.csv overview:
Model building
y = a x 1 + b x 2 + c x 3 + d x 4 + A ax_1+ bx_2 + cx_3 + dx_4 + A ax1+bx2+cx3+dx4+A
where y is the expected sales, x1~x4 are the corresponding promotion expenses, and b is the error term
Code
We can use LinearRegression to create a model,
and use the mean_squared_error module to calculate the mean squared error regression loss to evaluate the model.
Finally, bring in the numbers we want to predict to estimate sales.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# 读取包含所有四种数据的 CSV 文件
all_data = pd.read_csv('advertising.csv')
# 划分数据集
X = all_data[['taobao','tiktok','little red book']]
print(X)
y = all_data['sales']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 选择线性回归模型
model = LinearRegression()
# 训练模型
model.fit(X_train, y_train)
# 评估模型
y_pred_train = model.predict(X_train)
mse_train = mean_squared_error(y_train, y_pred_train)
print(f'Mean Squared Error on Training Data: {
mse_train}')
# 进行预测
X_new = pd.DataFrame({
'taobao': [200], 'tiktok': [100], 'little red book': [150]})
y_pred_new = model.predict(X_new)
print(f'Predicted Output for New Input: {
y_pred_new[0]}')
X_new = pd.DataFrame({
'taobao': [300], 'tiktok': [150], 'little red book': [200]})
y_pred_new = model.predict(X_new)
print(f'Predicted Output for New Input: {
y_pred_new[0]}')
result
Project source address:
https://gitee.com/yishangyishang/homeword.git