Objective
分析空气中主要污染物浓度与空气指数之间的关系
analyze data
天气污染物浓度的数据集,该数据集源自天气后报网站上爬取的数据,为北京2013年10月28日到2016年1月31日的空气污染物浓度的数据。包括空气质量等级、AQI指数和当天排名。
import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline import statsmodels.api as sm
Linear Regression
1. Data Pretreatment
data = pd.read_csv("beijing.csv",index_col = 0) data.head()
X = data.iloc[:,2:8] X = sm.add_constant(X) y = data.iloc [:, 0] print(X.head())
2. The model
model1 = sm.OLS (y, X) # model result = model1.fit () # training model print(result.summary())
result.f_pvalue # test was significant linear regression relationship
result.params # regression coefficients
Improved Model
Since the p value So2 and Co is greater than 0.05, so the exclusion of these two variables, re-establish the model
data = pd.read_csv("beijing.csv",index_col = 0) data.head()
X = data.iloc[:,[2,3,5,7]] X = sm.add_constant(X) y = data.iloc [:, 0] print(X.head())
model2 = sm.OLS (y, X) # model result = model2.fit () # training model print(result.summary())