R language analysis report of weather data in Qinhuangdao, Hebei in 2017

1.Background

AQI is the maximum value among the air quality sub-indexes of various pollutants. When the AQI is greater than 50, the pollutant with the largest IAQI is the primary pollutant. If there are two or more pollutants with the largest IAQI, they are listed as the primary pollutants. Air quality index is a dimensionless relative value that comprehensively represents the degree of air pollution or air quality level. The data source comes from the air quality index and other indicators of Qinhuangdao, Hebei Province in 2017. The analysis results show that the air quality index in Qinhuangdao, Hebei in 2017 was relatively high in winter and low in summer. According to the p-value inference on the relevant test, the air quality index in Qinhuangdao, Hebei in 2017 is related to the maximum temperature, minimum temperature, weather, wind direction and wind force. The air quality index is predicted based on the maximum temperature, weather, wind direction and wind force, and a multivariate linear model is established. The R2 of the linear model is 0.9968, and the prediction effect is very good.

2. Data source description

The data source comes from the air quality index and other indicators of Qinhuangdao, Hebei Province in 2017.

The data set df has a total of 365 rows and 10 columns.

Import the data set into R language to view the specific situation of each column

The data are all text types, and certain data preprocessing work is required, such as converting text into numbers. Mainly the maximum temperature and the minimum temperature.

After data processing, I can check the corresponding types and find that they meet my expectations.

Insert image description here

3. Data description

1. Descriptive Statistics

Is the air quality better/worse? Calculate statistical results for all air indicators for each year
Insert image description here

Among them, the weather and air pollution conditions in Qinhuangdao, Hebei in 2017 were mostly excellent and good.

2. Univariate analysis

Based on the data set, trend graphs of air quality index, maximum temperature and minimum temperature are plotted.

Insert image description here

Based on the data set, draw pie charts of weather, wind direction, wind strength and air pollution levels.

Insert image description here
Insert image description here

3. Bivariate analysis

What factors mainly affect air quality?
I performed a correlation test. The results of the following test, if the p value is less than 0.05, indicate a significant correlation.

Insert image description here

The air quality index is related to the highest and lowest temperatures. It is a negative correlation, which means that the higher the temperature, the lower the air quality index, that is, the better. That is, the air is good when it is hot, and the air is bad when it cools down. For example, fog often occurs. haze.

Chi-square test. For bivariate variables, a chi-square test is performed. If the p value is less than 0.05, it indicates a significant correlation.

Insert image description here

The p values ​​of the chi-square test are all less than 0.05, indicating that the degree of air pollution is significantly related to weather, wind direction, and wind strength.

Bivariate plotting

Insert image description here

statistical modeling

How to predict a city's air quality? Based on the above data set, make a multivariate linear model.

Insert image description here

Insert image description here

The R2 predicted by the multivariate linear model is 0.9968

4 Conclusion

Overall, the air quality index in Qinhuangdao, Hebei in 2017 was relatively high in winter and low in summer. According to the p-value inference on the relevant test, the air quality index in Qinhuangdao, Hebei in 2017 is related to the maximum temperature, minimum temperature, weather, wind direction and wind force. The air quality index is predicted based on the maximum temperature, weather, wind direction and wind force, and a multivariate linear model is established. The R2 of the linear model is 0.9968, and the prediction effect is very good.

Guess you like

Origin blog.csdn.net/weixin_54707168/article/details/132552971