About household electricity consumption data analysis

Data parameter explanation

insert image description here
This article mainly analyzes the active power, so the reactive power can be ignored temporarily

Import of database

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pyecharts.charts import Pie
from pyecharts.charts import *
import pyecharts.options as opts
from statsmodels.tsa.seasonal import seasonal_decompose
plt.rcParams['font.sans-serif'] = ['SimHei']  #设置中文字体为黑体
plt.rcParams['axes.unicode_minus'] = False #正常显示负号
pd.set_option('display.float_format',lambda x : '%.2f' % x)#pandas禁用科学计数法

pyecharts is Echarts is a data visualization open sourced by Baidu. With its good interactivity and exquisite chart design, it has been recognized by many developers. Python, on the other hand, is an expressive language well suited for data manipulation.
In addition, this article also uses the statsmodels library, which is a Python library for fitting a variety of statistical models, performing statistical tests, and data exploration and visualization. statsmodels contains more "classical" frequentist statistical methods, while Bayesian methods and machine learning models are found in other libraries.
I referred to this article when installing the statmodels library. If you need it, you can read it:
https://blog.csdn.net/m0_48313550/article/details/124731922

data processing

double naming

data.rename(columns={
    'Date':'日期',
    'Time':'时间',
    'Global_active_power':'有功功率',
    'Global_reactive_power':'无功功率',
    'Voltage':'电压',
    'Global_intensity':'电流',
    'Sub_metering_1': '厨房的有功功率',
    'Sub_metering_2': '洗衣房的有功功率',
    'Sub_metering_3': '电热水器和空调的有功功率',
},inplace=1)
data.head()

The column names in English are not convenient for a guy like me who is not good at English to analyze the data, so I changed the column names to Chinese

time format conversion

data['日期']=data['日期'].str.replace('/07','/2007')
data['时间']=data['时间'].astype(str)
data['日期']=data['日期'].astype(str)
data['index'] = pd.to_datetime(data['日期'] +' '+ data['时间'],format='%d/%m/%Y %H:%M:%S',errors='coerce')
data=data.drop(['日期','时间'],axis=1)
data.head()

Convert the 07 of the date column to 2007, and convert the data of the time column to str format, merge the time column of the date column into an index column, and delete the date and time columns. After conversion: we use the info() function to find that the value of the active power of the last electric water heater and air conditioner is missing. Due to the large amount of data, the missing data is not particularly large. Here we can choose
to
insert image description here
delete

Outlier handling

# 异常值处理
data = data.replace('?',np.NAN)
data['厨房的有功功率'] = data['厨房的有功功率'].astype('float64')
data['洗衣房的有功功率'] = data['洗衣房的有功功率'].astype('float64')
data['总功率']=data['厨房的有功功率']+data['洗衣房的有功功率']+data['电热水器和空调的有功功率']

Careful observation we can find some data is? , in this outlier format, we can convert outliers into null values, then calculate the total power and add it to the original data set.
insert image description here

Power consumption visualization

sum_data = data[['厨房的有功功率','洗衣房的有功功率','电热水器和空调的有功功率']].sum()
plt.pie(sum_data,labels=['厨房的有功功率','洗衣房的有功功率','电热水器和空调的有功功率'],autopct='%3.1f%%',explode=[0.2,0.2,0],radius=2)
# plt.title('不同家电的有功功率')
# plt.legend(loc="upper right")
plt.show()

The relationship between different types of household electricity consumption and total electricity consumption
Data analysis of household electricity consumption trends through the visualization of time series
insert image description here

Household Electricity Trend Chart

plt.figure(figsize=(12,8))
plt.subplot(321)
plt.plot(mon_sum.index,mon_sum['厨房的有功功率'])
plt.title('厨房的有功功率')
plt.subplot(322)
plt.plot(mon_sum.index,mon_sum['洗衣房的有功功率'])
plt.title('洗衣房的有功功率')
plt.subplot(323)
plt.plot(mon_sum.index,mon_sum['电热水器和空调的有功功率'])
plt.title('电热水器和空调的有功功率')
plt.subplot(324)
plt.plot(mon_sum.index,mon_sum.values)
plt.title('总有用功功率')
plt.subplot(3)

insert image description here
Later, you can analyze the power consumption from Monday to Sunday.
Use the weekday() function to return a value from 0 to 6, representing Monday to Sunday.

Guess you like

Origin blog.csdn.net/weixin_44052130/article/details/130455089