项目背景
随着电池技术进步和产业化推广,我国新能源汽车产业已进入蓬勃发展的快车道,各级政府先后发布政策持续支持新能源汽车技术和产业发展,全球车企对新能源汽车发展和应用也都充满热情,不断进行探索和试验。相较于传统汽车,新能源汽车电气化、智能化、网联化、共享化程度更高,可采集的数据更丰富,可以支持多方面、深层次的数据分析需求。
与此同时,在新一轮信息技术变革趋势下,车联网及大数据技术的应用为新能源汽车数据采集、运行分析、电池管理等领域带来了新的发展引擎和动能。
本项目拟对上海市新能源汽车公共数据采集与监测研究中心提供的新能源汽车运行数据展开分析,希望可以找到影响新能源汽车电池状态以及能耗的重要因素,通过用户的驾驶行为判断其使用风险等。
数据说明
数据集分为2个csv文件,其中:
-
SHEVDC_OV6N7709.csv为纯电汽车的运行数据
-
SHEVDC_0C023H25.csv为混动汽车的运行数据
各字段释义如下:
数据采集频率为每10s一次。
一、数据导入及预处理
1 数据导入
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#电动汽车数据
data_electric = pd.read_csv('SHEVDC_OV6N7709.csv')
data_electric.head()
time | vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-10 01:12:00 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
1 | 2019-01-10 01:12:10 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
2 | 2019-01-10 01:12:20 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
3 | 2019-01-10 01:12:30 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
4 | 2019-01-10 01:12:40 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
5 rows × 24 columns
#混动汽车数据
data_hybrid = pd.read_csv('SHEVDC_0C023H25.csv')
data_hybrid.head()
time | vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-06 15:36:27 | 1 | 3 | 1 | 79.7 | 69788.0 | 361.2 | 10.4 | 73 | 1 | ... | 3.769 | 1 | 96 | 3.761 | 1 | 3 | 25 | 1 | 6 | 23 |
1 | 2019-01-06 15:36:37 | 1 | 3 | 1 | 78.6 | 69789.0 | 360.0 | 13.1 | 72 | 1 | ... | 3.753 | 1 | 96 | 3.743 | 1 | 3 | 25 | 1 | 6 | 23 |
2 | 2019-01-06 15:36:47 | 1 | 3 | 1 | 74.2 | 69789.0 | 361.2 | 9.5 | 72 | 1 | ... | 3.765 | 1 | 96 | 3.757 | 1 | 3 | 25 | 1 | 6 | 23 |
3 | 2019-01-06 15:36:57 | 1 | 3 | 1 | 81.8 | 69789.0 | 350.5 | 63.9 | 72 | 1 | ... | 3.663 | 1 | 96 | 3.645 | 1 | 3 | 25 | 1 | 6 | 23 |
4 | 2019-01-06 15:37:07 | 1 | 3 | 1 | 74.1 | 69789.0 | 361.2 | 3.4 | 71 | 1 | ... | 3.789 | 1 | 96 | 3.782 | 1 | 3 | 25 | 1 | 6 | 23 |
5 rows × 27 columns
2 数据检查
2.1 是否包含空值
#电动汽车
data_electric.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6231 entries, 0 to 6230
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 6231 non-null object
1 vehiclestatus 6231 non-null int64
2 chargestatus 6231 non-null int64
3 runmodel 6231 non-null int64
4 speed 6231 non-null float64
5 summileage 6231 non-null object
6 sumvoltage 6231 non-null float64
7 sumcurrent 6231 non-null float64
8 soc 6231 non-null int64
9 dcdcstatus 6231 non-null int64
10 gearnum 6231 non-null int64
11 insulationresistance 6231 non-null int64
12 max_volt_num 6231 non-null int64
13 max_volt_cell_id 6231 non-null int64
14 max_cell_volt 6231 non-null float64
15 min_volt_num 6231 non-null int64
16 min_volt_cell_id 6231 non-null int64
17 min_cell_volt 6231 non-null float64
18 max_temp_num 6231 non-null int64
19 max_temp_probe_id 6231 non-null int64
20 max_temp 6231 non-null int64
21 min_temp_num 6231 non-null int64
22 min_temp_probe_id 6231 non-null int64
23 min_temp 6231 non-null int64
dtypes: float64(5), int64(17), object(2)
memory usage: 1.1+ MB
电动车运行数据共6231条,不含空值,但summileage字段数据类型为object,将它转化为float64方便接下来的分析。
#summileage字段转化为float64类型
data_electric['summileage'] = pd.to_numeric(data_electric['summileage'],errors='coerce')
#向下填充值
data_electric['summileage']=data_electric['summileage'].fillna(method='ffill')
data_electric['summileage']
0 39938.0
1 39938.0
2 39938.0
3 39938.0
4 39938.0
...
6226 40152.0
6227 40152.0
6228 40152.0
6229 40152.0
6230 40152.0
Name: summileage, Length: 6231, dtype: float64
#混动汽车
data_hybrid.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3121 entries, 0 to 3120
Data columns (total 27 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 3121 non-null object
1 vehiclestatus 3121 non-null int64
2 chargestatus 3121 non-null int64
3 runmodel 3121 non-null int64
4 speed 3121 non-null float64
5 summileage 3121 non-null float64
6 sumvoltage 3121 non-null float64
7 sumcurrent 3121 non-null float64
8 soc 3121 non-null int64
9 dcdcstatus 3121 non-null int64
10 gearnum 3121 non-null int64
11 insulationresistance 3121 non-null int64
12 enginestatus 1689 non-null float64
13 grankshaftspeed 1689 non-null float64
14 enginefuelconsumptionrate 1689 non-null float64
15 max_volt_num 3121 non-null int64
16 max_volt_cell_id 3121 non-null int64
17 max_cell_volt 3121 non-null float64
18 min_volt_num 3121 non-null int64
19 min_volt_cell_id 3121 non-null int64
20 min_cell_volt 3121 non-null float64
21 max_temp_num 3121 non-null int64
22 max_temp_probe_id 3121 non-null int64
23 max_temp 3121 non-null int64
24 min_temp_num 3121 non-null int64
25 min_temp_probe_id 3121 non-null int64
26 min_temp 3121 non-null int64
dtypes: float64(9), int64(17), object(1)
memory usage: 658.5+ KB
混动汽车运行数据共3231条,其中出现了enginestatus/grankshaftspeed/enginefuelconsumptionrate三个字段存在部分空值的情形,从前面的数据说明中我们了解到这三个字段是描述发动机状态的,既当混动汽车采取电动模式运行时这部分字段为空,是合理的,此处无需特殊处理。
2.2数据采集时间
#电动汽车
print("最早时间:",data_electric['time'].min())
print("最晚时间:",data_electric['time'].max())
最早时间: 2019-01-10 01:12:00
最晚时间: 2019-01-11 12:16:18
#混动汽车
print("最早时间:",data_hybrid['time'].min())
print("最晚时间:",data_hybrid['time'].max())
最早时间: 2019-01-06 15:36:27
最晚时间: 2019-01-07 00:31:28
2.3统计性描述
#电动汽车
data_electric.describe()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 6231.000000 | 6231.000000 | 6231.0 | 6231.000000 | 6231.000000 | 6231.000000 | 6231.000000 | 6231.000000 | 6231.000000 | 6231.000000 | ... | 6231.00000 | 6231.0 | 6231.000000 | 6231.000000 | 6231.0 | 6231.000000 | 6231.000000 | 6231.0 | 6231.000000 | 6231.000000 |
mean | 1.700690 | 1.563633 | 1.0 | 10.126705 | 40083.661692 | 363.703418 | 0.943861 | 58.357407 | 1.686888 | 4.420960 | ... | 3.79234 | 1.0 | 70.811266 | 3.782679 | 1.0 | 19.564275 | 10.004815 | 1.0 | 21.043332 | 8.783983 |
std | 0.457993 | 0.889976 | 0.0 | 21.666992 | 49.314093 | 17.014157 | 26.379983 | 26.393123 | 0.463797 | 6.556641 | ... | 0.17684 | 0.0 | 15.432097 | 0.176314 | 0.0 | 3.622739 | 1.322351 | 0.0 | 6.613793 | 1.089954 |
min | 1.000000 | 1.000000 | 1.0 | 0.000000 | 39938.000000 | 322.200000 | -113.100000 | 7.000000 | 1.000000 | 0.000000 | ... | 3.38200 | 1.0 | 6.000000 | 3.346000 | 1.0 | 7.000000 | 6.000000 | 1.0 | 2.000000 | 5.000000 |
25% | 1.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 348.500000 | -9.200000 | 35.000000 | 1.000000 | 0.000000 | ... | 3.63200 | 1.0 | 75.000000 | 3.627000 | 1.0 | 17.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
50% | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 363.000000 | -8.800000 | 64.000000 | 2.000000 | 0.000000 | ... | 3.78600 | 1.0 | 75.000000 | 3.773000 | 1.0 | 19.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
75% | 2.000000 | 3.000000 | 1.0 | 0.000000 | 40091.000000 | 377.700000 | 0.800000 | 80.000000 | 2.000000 | 14.000000 | ... | 3.93800 | 1.0 | 75.000000 | 3.929000 | 1.0 | 23.000000 | 11.000000 | 1.0 | 24.000000 | 9.000000 |
max | 2.000000 | 4.000000 | 1.0 | 100.700000 | 40152.000000 | 397.500000 | 240.300000 | 100.000000 | 2.000000 | 15.000000 | ... | 4.14700 | 1.0 | 96.000000 | 4.137000 | 1.0 | 24.000000 | 13.000000 | 1.0 | 24.000000 | 11.000000 |
8 rows × 23 columns
以上可见:
·该电动汽车的行驶速度最大为100.7km/h,累计里程从39938km增长为40152km(共行驶214km)
·行驶过程中的总电压在322.2V~397.5V之间变化,总电流在-113.1A~240.3A之间变化
·SOC(剩余电量)最小为7%,最大为100%,平均电量为58%
·电池单体电压在3.35V~4.15V之间变化,电池温度在5~13℃之间变化
#混动汽车
data_hybrid.describe()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | ... | 3121.000000 | 3121.0 | 3121.000000 | 3121.000000 | 3121.0 | 3121.000000 | 3121.000000 | 3121.0 | 3121.000000 | 3121.000000 |
mean | 1.458827 | 1.947453 | 1.736302 | 13.635854 | 69853.980455 | 361.539250 | -0.641974 | 57.543095 | 1.458827 | 7.719321 | ... | 3.767752 | 1.0 | 61.212752 | 3.760058 | 1.0 | 3.423903 | 27.523871 | 1.0 | 6.462352 | 25.536367 |
std | 0.498382 | 0.928775 | 0.454316 | 23.723204 | 42.582988 | 15.284665 | 17.985884 | 26.051087 | 0.498382 | 7.084430 | ... | 0.158410 | 0.0 | 26.075302 | 0.159284 | 0.0 | 14.340370 | 1.563247 | 0.0 | 14.126442 | 1.505558 |
min | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 69788.000000 | 330.000000 | -108.000000 | 18.000000 | 1.000000 | 0.000000 | ... | 3.450000 | 1.0 | 6.000000 | 3.417000 | 1.0 | 2.000000 | 24.000000 | 1.0 | 5.000000 | 22.000000 |
25% | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 69822.000000 | 348.200000 | -8.300000 | 32.000000 | 1.000000 | 0.000000 | ... | 3.630000 | 1.0 | 39.000000 | 3.621000 | 1.0 | 2.000000 | 27.000000 | 1.0 | 5.000000 | 25.000000 |
50% | 1.000000 | 2.000000 | 2.000000 | 0.000000 | 69833.000000 | 359.500000 | -5.700000 | 60.000000 | 1.000000 | 14.000000 | ... | 3.744000 | 1.0 | 68.000000 | 3.740000 | 1.0 | 2.000000 | 28.000000 | 1.0 | 5.000000 | 26.000000 |
75% | 2.000000 | 3.000000 | 2.000000 | 22.600000 | 69909.000000 | 374.700000 | 1.600000 | 81.000000 | 2.000000 | 14.000000 | ... | 3.903000 | 1.0 | 71.000000 | 3.898000 | 1.0 | 2.000000 | 28.000000 | 1.0 | 7.000000 | 26.000000 |
max | 2.000000 | 3.000000 | 3.000000 | 103.600000 | 69909.000000 | 390.200000 | 105.100000 | 100.000000 | 2.000000 | 15.000000 | ... | 4.065000 | 1.0 | 255.000000 | 4.056000 | 1.0 | 255.000000 | 33.000000 | 1.0 | 255.000000 | 31.000000 |
8 rows × 26 columns
以上可见:
·该混动汽车的最大行驶速度为103.6km/h,累计里程由69788km增长为69909km(共行驶121km)
·行驶过程中的总电压在330.0V~390.2V之间变化,总电流在-108.0A~105.1A之间变化(总电流最大值明显低于电动汽车)
·SOC(剩余电量)最小为18%,最大为100%,平均为57.5%
·电池单体电压在3.42V~4.07V之间变化(变化幅度小于电动汽车),电池温度在22~33℃之间变化(明显高于电动汽车)
3 数据预处理
由于数据采集频率为每10s一次,间隔过小,不利于后续分析,因此我们对time字段只取小时,对每个小时内的数据取平均值即可。
#电动汽车
def hour(time):
return time[5:13]
data_electric['time'] = data_electric['time'].apply(hour)
electric_group = data_electric.groupby('time').mean()
electric_group.head()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
01-10 01 | 1.000000 | 3.093190 | 1.0 | 14.955914 | 39940.243728 | 392.696416 | 12.277419 | 97.379928 | 1.00000 | 14.301075 | ... | 4.097606 | 1.0 | 61.620072 | 4.083305 | 1.0 | 18.673835 | 6.000000 | 1.0 | 23.261649 | 5.103943 |
01-10 02 | 1.000000 | 2.864865 | 1.0 | 42.981081 | 39974.094595 | 371.701351 | 27.295495 | 78.216216 | 1.00000 | 14.067568 | ... | 3.878932 | 1.0 | 59.450450 | 3.864725 | 1.0 | 19.554054 | 8.027027 | 1.0 | 22.225225 | 7.274775 |
01-10 03 | 1.000000 | 2.808989 | 1.0 | 43.995787 | 40014.907303 | 352.339607 | 29.968258 | 55.339888 | 1.00000 | 14.095506 | ... | 3.677239 | 1.0 | 58.817416 | 3.663022 | 1.0 | 19.561798 | 9.949438 | 1.0 | 19.668539 | 8.814607 |
01-10 04 | 1.000000 | 2.816667 | 1.0 | 46.776944 | 40057.422222 | 340.896667 | 35.086111 | 29.002778 | 1.00000 | 14.005556 | ... | 3.558089 | 1.0 | 55.441667 | 3.544222 | 1.0 | 19.438889 | 10.772222 | 1.0 | 19.538889 | 9.355556 |
01-10 05 | 1.707246 | 1.539130 | 1.0 | 10.651014 | 40089.739130 | 337.776522 | 2.398261 | 9.944928 | 1.66087 | 4.794203 | ... | 3.525594 | 1.0 | 71.373913 | 3.511803 | 1.0 | 18.646377 | 12.452174 | 1.0 | 20.939130 | 10.608696 |
5 rows × 23 columns
electric_group
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
01-10 01 | 1.000000 | 3.093190 | 1.0 | 14.955914 | 39940.243728 | 392.696416 | 12.277419 | 97.379928 | 1.000000 | 14.301075 | ... | 4.097606 | 1.0 | 61.620072 | 4.083305 | 1.0 | 18.673835 | 6.000000 | 1.0 | 23.261649 | 5.103943 |
01-10 02 | 1.000000 | 2.864865 | 1.0 | 42.981081 | 39974.094595 | 371.701351 | 27.295495 | 78.216216 | 1.000000 | 14.067568 | ... | 3.878932 | 1.0 | 59.450450 | 3.864725 | 1.0 | 19.554054 | 8.027027 | 1.0 | 22.225225 | 7.274775 |
01-10 03 | 1.000000 | 2.808989 | 1.0 | 43.995787 | 40014.907303 | 352.339607 | 29.968258 | 55.339888 | 1.000000 | 14.095506 | ... | 3.677239 | 1.0 | 58.817416 | 3.663022 | 1.0 | 19.561798 | 9.949438 | 1.0 | 19.668539 | 8.814607 |
01-10 04 | 1.000000 | 2.816667 | 1.0 | 46.776944 | 40057.422222 | 340.896667 | 35.086111 | 29.002778 | 1.000000 | 14.005556 | ... | 3.558089 | 1.0 | 55.441667 | 3.544222 | 1.0 | 19.438889 | 10.772222 | 1.0 | 19.538889 | 9.355556 |
01-10 05 | 1.707246 | 1.539130 | 1.0 | 10.651014 | 40089.739130 | 337.776522 | 2.398261 | 9.944928 | 1.660870 | 4.794203 | ... | 3.525594 | 1.0 | 71.373913 | 3.511803 | 1.0 | 18.646377 | 12.452174 | 1.0 | 20.939130 | 10.608696 |
01-10 06 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 344.259824 | -9.364223 | 16.454545 | 1.988270 | 0.175953 | ... | 3.589308 | 1.0 | 75.000000 | 3.581625 | 1.0 | 16.777126 | 11.137830 | 1.0 | 24.000000 | 9.366569 |
01-10 07 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 347.098291 | -9.257835 | 24.185185 | 1.991453 | 0.128205 | ... | 3.616860 | 1.0 | 75.376068 | 3.613821 | 1.0 | 19.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
01-10 08 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 348.884644 | -9.294757 | 31.902622 | 1.985019 | 0.224719 | ... | 3.635509 | 1.0 | 75.498127 | 3.632603 | 1.0 | 19.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
01-10 09 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 350.999251 | -9.327715 | 39.928839 | 1.992509 | 0.112360 | ... | 3.657670 | 1.0 | 75.000000 | 3.654041 | 1.0 | 19.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
01-10 10 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 353.709705 | -9.232911 | 47.611814 | 1.987342 | 0.189873 | ... | 3.686333 | 1.0 | 75.000000 | 3.681321 | 1.0 | 21.481013 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
01-10 11 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 358.276364 | -9.192727 | 55.724242 | 1.951515 | 0.727273 | ... | 3.734936 | 1.0 | 75.000000 | 3.726336 | 1.0 | 17.715152 | 10.657576 | 1.0 | 18.933333 | 9.000000 |
01-10 12 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 364.793910 | -9.020833 | 63.089744 | 1.987179 | 0.192308 | ... | 3.802901 | 1.0 | 75.000000 | 3.791715 | 1.0 | 15.000000 | 11.000000 | 1.0 | 15.467949 | 9.605769 |
01-10 13 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 371.663333 | -8.827000 | 71.036667 | 1.986667 | 0.200000 | ... | 3.873410 | 1.0 | 75.000000 | 3.865173 | 1.0 | 18.270000 | 10.646667 | 1.0 | 8.960000 | 9.000000 |
01-10 14 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 377.689337 | -8.700865 | 78.190202 | 1.988473 | 0.172911 | ... | 3.936225 | 1.0 | 75.000000 | 3.927720 | 1.0 | 23.181556 | 10.000000 | 1.0 | 23.636888 | 9.000000 |
01-10 15 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 384.202194 | -8.329154 | 85.435737 | 1.987461 | 0.188088 | ... | 4.004536 | 1.0 | 75.000000 | 3.995395 | 1.0 | 23.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
01-10 16 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 390.633010 | -8.200324 | 92.022654 | 1.987055 | 0.194175 | ... | 4.071887 | 1.0 | 75.000000 | 4.062068 | 1.0 | 22.145631 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
01-10 17 | 1.092105 | 2.714912 | 1.0 | 27.128509 | 40100.767544 | 386.642544 | 16.198684 | 91.315789 | 1.083333 | 12.899123 | ... | 4.033289 | 1.0 | 62.469298 | 4.019842 | 1.0 | 21.412281 | 10.000000 | 1.0 | 23.877193 | 9.000000 |
01-10 18 | 1.000000 | 2.838889 | 1.0 | 25.690000 | 40130.005556 | 373.081667 | 14.096667 | 77.366667 | 1.000000 | 14.011111 | ... | 3.890811 | 1.0 | 65.633333 | 3.878950 | 1.0 | 16.072222 | 10.888889 | 1.0 | 21.366667 | 9.066667 |
01-11 08 | 1.000000 | 2.921053 | 1.0 | 14.306140 | 40142.903509 | 364.174561 | 17.146491 | 69.570175 | 1.000000 | 14.043860 | ... | 3.799772 | 1.0 | 72.271930 | 3.787474 | 1.0 | 22.333333 | 7.526316 | 1.0 | 22.245614 | 7.026316 |
01-11 09 | 1.795556 | 1.395556 | 1.0 | 3.412444 | 40151.773333 | 362.735111 | -3.976889 | 64.471111 | 1.764444 | 3.293333 | ... | 3.782591 | 1.0 | 73.786667 | 3.770284 | 1.0 | 22.137778 | 8.906667 | 1.0 | 22.888889 | 8.000000 |
01-11 10 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40152.000000 | 370.540553 | -8.768664 | 71.603687 | 1.981567 | 0.276498 | ... | 3.861834 | 1.0 | 75.000000 | 3.853581 | 1.0 | 24.000000 | 9.000000 | 1.0 | 14.576037 | 8.000000 |
01-11 11 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40152.000000 | 376.910744 | -8.672314 | 78.942149 | 1.995868 | 0.061983 | ... | 3.928339 | 1.0 | 75.000000 | 3.919938 | 1.0 | 17.272727 | 9.801653 | 1.0 | 17.181818 | 8.371901 |
01-11 12 | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40152.000000 | 380.900000 | -8.561446 | 83.445783 | 1.951807 | 0.722892 | ... | 3.969976 | 1.0 | 75.000000 | 3.961241 | 1.0 | 17.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
23 rows × 23 columns
#查看聚合后的数据形状
electric_group.shape
(23, 23)
混动汽车数据量较小,我们对其每15分钟内的数据做一次汇总即可。
#混动汽车
def quarter(time):
m = int(time[14:16])//15+1
return time[5:13]+' '+str(m)
data_hybrid['time'] = data_hybrid['time'].apply(quarter)
hybrid_group = data_hybrid.groupby('time').mean()
hybrid_group.head()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
01-06 15 3 | 1.0 | 2.711538 | 1.269231 | 35.171154 | 69791.730769 | 361.069231 | 7.448077 | 69.769231 | 1.0 | 14.000000 | ... | 3.765519 | 1.0 | 72.461538 | 3.757135 | 1.0 | 5.538462 | 25.000000 | 1.0 | 7.269231 | 23.634615 |
01-06 15 4 | 1.0 | 2.844444 | 1.755556 | 9.181111 | 69795.433333 | 360.187778 | 2.195556 | 66.111111 | 1.0 | 14.511111 | ... | 3.753611 | 1.0 | 78.033333 | 3.747700 | 1.0 | 4.555556 | 25.055556 | 1.0 | 6.688889 | 23.322222 |
01-06 16 1 | 1.0 | 3.000000 | 2.000000 | 0.000000 | 69796.000000 | 359.935165 | 0.803297 | 64.560440 | 1.0 | 15.000000 | ... | 3.749692 | 1.0 | 79.967033 | 3.745231 | 1.0 | 2.703297 | 24.813187 | 1.0 | 6.318681 | 23.000000 |
01-06 16 2 | 1.0 | 2.858696 | 1.793478 | 6.386957 | 69796.108696 | 359.080435 | 1.629348 | 63.402174 | 1.0 | 14.597826 | ... | 3.739848 | 1.0 | 75.565217 | 3.735196 | 1.0 | 3.608696 | 24.000000 | 1.0 | 5.391304 | 22.141304 |
01-06 16 3 | 1.0 | 2.766667 | 1.533333 | 25.726667 | 69799.922222 | 353.848889 | 10.896667 | 58.222222 | 1.0 | 14.000000 | ... | 3.688000 | 1.0 | 67.688889 | 3.680811 | 1.0 | 2.933333 | 24.355556 | 1.0 | 5.911111 | 22.555556 |
5 rows × 26 columns
#查看聚合后的形状
hybrid_group.shape
(37, 26)
二、数据分析
1 探索性分析
def distribution1(column):
x = data1[column].value_counts().index
y = data1[column].value_counts().values
plt.bar(x,y,width=data1[column].nunique()*0.2)
plt.xlabel(column)
plt.figure(figsize=(15,3.5))
data1 = data_electric[['vehiclestatus','chargestatus','runmodel','dcdcstatus']]
for i in range(0,4):
plt.subplot(1,4,i+1)
distribution1(data1.columns[i])
plt.show()
以上可见:
·该电动汽车有车辆启动1和熄火2两个状态,且大部分时间处于熄火状态
·充电状态主要为停车充电1和未充电3,较少时间处于行驶充电2和充电完成4状态
·电动汽车运行模式只有纯电一种
·DCDC状态(直流电切换)超过一半时间为断开2状态
#混动汽车数据分析
def distribution1(column):
x = data2[column].value_counts().index
y = data2[column].value_counts().values
plt.bar(x,y,width=data2[column].nunique()*0.2)
plt.xlabel(column)
plt.figure(figsize=(17,3))
data2 = data_hybrid[['vehiclestatus','chargestatus','runmodel','dcdcstatus','enginestatus']]
for x in range(0,5):
plt.subplot(1,5,x+1)
distribution1(data2.columns[x])
plt.show()
以上可见:
·该混动汽车主要有车辆启动1和熄火2两个状态,且大部分时间处于车辆启动状态
·充电状态主要为停车充电1和未充电3,较少时间处于行驶充电2状态,没有充电完成4的状态
·混动汽车超过60%的时间采取混动2模式行驶,较少时间采取纯电1模式行驶,采取燃油3模式行驶的时间几乎可以忽略不计
·混动汽车超过一半的时间DCDC直流电切换处于工作1状态
·混动汽车大约3/4的时间发动机处于关闭2状态
接着来看一下混动汽车分别处于启动和熄火两个状态时发动机的状态:
print("车辆启动:",data_hybrid[data_hybrid['vehiclestatus']==1].shape[0])
print("车辆熄火:",data_hybrid[data_hybrid['vehiclestatus']==2].shape[0])
车辆启动: 1689
车辆熄火: 1432
#含发动机状态的1689条数据是否全部来源为车辆启动时
data_hybrid[data_hybrid['vehiclestatus']==2]['enginestatus'].unique()
array([nan])
这里就可以得知:
混动汽车采集的3121条数据中,只有当车辆处于启动状态时才会采集发动机状态;
而在采集了发动机状态的这1689条数据中,有大约3/4的时间发动机为关闭状态。
2 电池健康状态
2.1 SOC变化曲线
#电动汽车SOC变化曲线
plt.figure(figsize=(8,5))
electric_xticks = []
for i in range(12):
electric_xticks.append(electric_group.index[2*i])
plt.plot(electric_group.index,electric_group.soc)
plt.xticks(electric_xticks,rotation=45)
plt.ylabel('SOC %')
plt.show()
#电动汽车行驶和熄火的SOC变化图
from matplotlib import ticker
electric_group1=data_electric[data_electric.vehiclestatus==1].groupby('time').mean()
electric_group2=data_electric[data_electric.vehiclestatus==2].groupby('time').mean()
electric_group1=pd.merge(pd.DataFrame(data_electric.time.unique(),columns=['time']),electric_group1,how='left',left_on='time',
right_on=electric_group1.index)
electric_group1.fillna(0)
electric_group2=pd.merge(pd.DataFrame(data_electric.time.unique(),columns=['time']),electric_group2,how='left',left_on='time',
right_on=electric_group2.index)
electric_group2.fillna(0)
fig,axes=plt.subplots(figsize=(10,5))
axes.scatter(electric_group1.time,electric_group1.soc,label='driving')
plt.xticks(electric_xticks,rotation=45)
plt.ylabel('SOC%')
formatter=ticker.FormatStrFormatter('%d%%')
axes.yaxis.set_major_formatter(formatter)
axes.scatter(electric_group2.time,electric_group2.soc,label='flameout',c='g')
plt.legend()
plt.show()
以上可知:
·该电动汽车在数据所在时间段内经历了几个放电-充电的循环过程,车辆熄火时一般都在充电;
·电动汽车放电、充电时的SOC变化与时间呈线性关系;
·第一次(接近)完全放电用时4h左右,而第一次完全充电则用时12h左右。
plt.figure(figsize=(10,5))
hybrid_xticks=[]
for x in range(13):
hybrid_xticks.append(hybrid_group.index[3*x])
plt.plot(hybrid_group.index,hybrid_group.soc)
plt.xticks(hybrid_xticks,rotation=45)
plt.ylabel('SOC%')
plt.show()
#混动汽车行驶和熄火时的SOC变化图
hybrid_group1=data_hybrid[data_hybrid.vehiclestatus==1].groupby('time').mean()
hybrid_group2=data_hybrid[data_hybrid.vehiclestatus==2].groupby('time').mean()
hybrid_group1=pd.merge(pd.DataFrame(data_hybrid.time.unique(),columns=['time']),hybrid_group1,how='left',
left_on='time',right_on=hybrid_group1.index)
hybrid_group1.fillna(0)
hybrid_group2=pd.merge(pd.DataFrame(data_hybrid.time.unique(),columns=['time']),hybrid_group2,how='left',
left_on='time',right_on=hybrid_group2.index)
hybrid_group2.fillna(0)
fig,axes=plt.subplots(figsize=(10,5))
axes.scatter(hybrid_group1.time,hybrid_group1.soc,label='driving')
plt.xticks(hybrid_xticks,rotation=45)
plt.ylabel('SOC%')
formatter=ticker.FormatStrFormatter('%d%%')
axes.yaxis.set_major_formatter(formatter)
axes.scatter(hybrid_group2.time,hybrid_group2.soc,label='flameout',c='g')
plt.legend()
plt.show()
以上可知:
·该混动汽车绝大部分时间在车辆处于启动状态时耗电,在熄火状态时充电,但也有少数时间在车辆启动时电量增加,推测该时间段汽车处于行驶充电状态;
·动汽车SOC(剩余电量)从30%增加至100%用时2h左右,第一次SOC(剩余电量)由100%降低至20%也用时2.5h左右,我们推测混动汽车每次完全充电和完全放电均用时3h左右,同时这也就说明了混动汽车的电池容量是明显小于电动汽车的;
·混动汽车的SOC(剩余电量)大约有一半时间不超过50%,这说明混动汽车对电池的依赖不如电动汽车那么强(与我们的认知相符)。
2.2 温度变化曲线
#电动汽车电池温度变化曲线
plt.figure(figsize=(10,5))
plt.plot(electric_group.index,electric_group.soc,label='SOC')
plt.plot(electric_group.index,electric_group.max_temp,label='Temp')
plt.xticks(electric_xticks,rotation=45)
plt.legend()
plt.show()
以上可见,电动汽车电池温度变化整体还是比较稳定的,但在快速放电时有可能在短时间内升高。
#混动汽车电池温度变化曲线
plt.figure(figsize=(10,5))
plt.plot(hybrid_group.index,hybrid_group.soc,label='SOC')
plt.plot(hybrid_group.index,hybrid_group.max_temp,label='Temp')
plt.xticks(hybrid_xticks,rotation=45)
plt.legend()
plt.show()
以上可知:
·混动汽车电池温度变化趋势与电动汽车相似,电池温度升高主要是SOC(剩余电量)较低时;
·另外,混动汽车的电池温度平均值是明显高于电动汽车的。
2.3 功率变化曲线
对电动汽车,总功率P=UI(总电压 * 总电流),下面我们就来观察一下电动汽车的功率是如何变化的:
#电动汽车功率变化曲线
electric_group['power']=electric_group['sumvoltage']*electric_group['sumcurrent']
fig,ax1=plt.subplots(figsize=(10,5))
ax1.plot(electric_group.index,electric_group.soc,label='SOC')
plt.xticks(electric_xticks,rotation=45)
plt.legend(loc='upper left')
ax2=ax1.twinx()
ax2.plot(electric_group.index,electric_group.power,label='Power',c='orange')
plt.xticks(electric_xticks,rotation=45)
plt.legend()
plt.show()
以上可见:
·对电动汽车,充电期间总功率是恒定的,约为-3500W左右;
·电动汽车每次充电时,总功率会由当前值迅速降低至-3500W左右;
·电动汽车每次放电时,总功率会迅速增加,而在放电的初期(提速阶段),总功率增加得更快一些。
对混动汽车,总功率P=UI=FV,当混动汽车采取混动模式行驶时可认为P=a (UI)+b (FV)(其中a、b为常量,且a+b=1)
#混动汽车功率及速度变化曲线
hybrid_group['power']=hybrid_group['sumvoltage']*hybrid_group['sumcurrent']
fig,ax1=plt.subplots(figsize=(10,5))
ax1.plot(hybrid_group.index,hybrid_group.soc,label='SOC')
ax1.plot(hybrid_group.index,hybrid_group.speed,label='Speed')
plt.xticks(hybrid_xticks,rotation=45)
plt.legend()
ax2=ax1.twinx()
ax2.plot(hybrid_group.index,hybrid_group.power,label='Power',c='g')
plt.xticks(hybrid_xticks,rotation=45)
plt.legend()
plt.show()
以上可见:
·混动汽车充电初期总功率维持在-3000W左右,但在充电末期总功率会迅速提升至0,推测可能是开启了行驶充电模式;
·混动汽车的速度与总功率的变化趋势十分相似,但也有少部分时间呈负相关,我们推测这些时段可能是混动汽车开启了混动模式行驶。
3 能耗预测
3.1 电动汽车
#电动汽车行驶状态数据
data_electric['power']=data_electric['sumvoltage']*data_electric['sumcurrent']
data_electric['distince']=data_electric['summileage']-min(data_electric['summileage'])
data_electric=data_electric[(data_electric['vehiclestatus']==1)&(data_electric['chargestatus']!=1)]
data_electric_corr=data_electric[['distince','speed','power','gearnum','max_temp','soc']].corr()
data_electric_corr
distince | speed | power | gearnum | max_temp | soc | |
---|---|---|---|---|---|---|
distince | 1.000000 | 0.001669 | 0.000802 | -0.149200 | 0.605537 | -0.300539 |
speed | 0.001669 | 1.000000 | 0.365273 | -0.171837 | 0.310909 | -0.343134 |
power | 0.000802 | 0.365273 | 1.000000 | -0.069520 | 0.122373 | -0.168553 |
gearnum | -0.149200 | -0.171837 | -0.069520 | 1.000000 | -0.146253 | 0.119784 |
max_temp | 0.605537 | 0.310909 | 0.122373 | -0.146253 | 1.000000 | -0.737968 |
soc | -0.300539 | -0.343134 | -0.168553 | 0.119784 | -0.737968 | 1.000000 |
import seaborn as sns
sns.pairplot(data_electric_corr)
<seaborn.axisgrid.PairGrid at 0x1ecb93a4490>
#线性回归
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
y=data_electric.soc
x=data_electric[['max_temp','speed','power']]
train_x, test_x, train_y, test_y=train_test_split(x,y,test_size=0.3)
#实例化一个LinearRegression类并调用
model=LinearRegression().fit(train_x,train_y)
text_y_pred=model.predict(test_x)
model
LinearRegression()
#查看回归系数与截距
a=model.coef_
b=model.intercept_
print("回归系数:",a,"截距:",b)
回归系数: [-9.68475897e+00 -1.17614358e-01 -8.28857862e-05] 截距: 159.1528907150104
from sklearn.metrics import r2_score
r2_score(test_y,text_y_pred) #model.score(test_x,test_y)
0.5564915327088534
导致准确率不高的原因可能有以下几点:
1.变量对SOC的影响不是线性的
2.影响SOC的重要因素可能不是温度,行驶速度,总功率
3.数据可能包含了几个放电-充电-放电的流程,每一个行驶前的初始电量又是变化的,也许我们应该考虑一个完整的放电过程去做拟合。
#电动车一次完整的放电过程数据
data_electric2=data_electric[(data_electric['time']>='01-10 01')&(data_electric['time']<='01-10 05')]
data_electric2[['distince','speed','power','gearnum','max_temp','soc']].corr()
distince | speed | power | gearnum | max_temp | soc | |
---|---|---|---|---|---|---|
distince | 1.000000 | 0.349117 | 0.156051 | -0.191131 | 0.968572 | -0.999296 |
speed | 0.349117 | 1.000000 | 0.361904 | -0.257999 | 0.366727 | -0.347465 |
power | 0.156051 | 0.361904 | 1.000000 | -0.103565 | 0.157798 | -0.155686 |
gearnum | -0.191131 | -0.257999 | -0.103565 | 1.000000 | -0.194100 | 0.190021 |
max_temp | 0.968572 | 0.366727 | 0.157798 | -0.194100 | 1.000000 | -0.964628 |
soc | -0.999296 | -0.347465 | -0.155686 | 0.190021 | -0.964628 | 1.000000 |
以上可见:SOC的变化与行驶距离distince和电池温度max_temp具有非常高的相关性!
y=data_electric2.soc
X=data_electric2[['distince','max_temp','gearnum']]
train_X, test_X, train_y, test_y=train_test_split(X,y,test_size=0.3)
model=LinearRegression().fit(train_X,train_y)
train_y_pred=model.predict(test_X)
model
LinearRegression()
#查看回归系数与截距
a=model.coef_
b=model.intercept_
print("回归系数:",a,"截距:",b)
回归系数: [-0.61861013 0.70541269 -0.03986971] 截距: 95.61569826869962
model.score(test_X,test_y)
0.9987001712803837
r2_score(test_y,train_y_pred)
0.9987001712803837
系数接近于1,对SOC的描述程度非常高,此时对应的模型为:SOC=-0.6181 distince + 0.6794 max_temp - 0.0025 gearnum + 95.2234
这也就表明了电动汽车在放电行驶过程中时,SOC(剩余电量)会随着行驶距离和挡位的增加而降低,其中行驶距离是主要影响因素;
另外电池温度的适当提升还可以提升SOC(剩余电量),推测该电动车有可能是行驶在户外温度较低的情形下,温度过低抑制了电池性能的发挥。
3.2混动汽车
#混动汽车行驶数据
data_hybrid['power']=data_hybrid['sumvoltage']*data_hybrid['sumcurrent']
data_hybrid['distince']=data_hybrid['summileage']-min(data_hybrid['summileage'])
data_hybrid1=data_hybrid[data_hybrid['enginestatus']==1]
data_hybrid1[['chargestatus','distince','speed','power','gearnum','max_temp','grankshaftspeed','soc']].corr()
chargestatus | distince | speed | power | gearnum | max_temp | grankshaftspeed | soc | |
---|---|---|---|---|---|---|---|---|
chargestatus | 1.000000 | -0.118385 | 0.004872 | 0.687276 | -0.058846 | -0.071826 | 0.043988 | 0.164638 |
distince | -0.118385 | 1.000000 | 0.373480 | -0.066912 | 0.035244 | 0.672353 | 0.072657 | -0.818908 |
speed | 0.004872 | 0.373480 | 1.000000 | 0.001805 | 0.084560 | 0.604542 | 0.579285 | -0.375031 |
power | 0.687276 | -0.066912 | 0.001805 | 1.000000 | -0.008415 | -0.065653 | -0.050287 | 0.099259 |
gearnum | -0.058846 | 0.035244 | 0.084560 | -0.008415 | 1.000000 | -0.007834 | -0.023145 | -0.105821 |
max_temp | -0.071826 | 0.672353 | 0.604542 | -0.065653 | -0.007834 | 1.000000 | 0.313536 | -0.429131 |
grankshaftspeed | 0.043988 | 0.072657 | 0.579285 | -0.050287 | -0.023145 | 0.313536 | 1.000000 | -0.075443 |
soc | 0.164638 | -0.818908 | -0.375031 | 0.099259 | -0.105821 | -0.429131 | -0.075443 | 1.000000 |
以上可见,影响混动汽车SOC变化的两个重要因素是行驶距离和电池温度。
y=data_hybrid1.soc
x=data_hybrid1[['distince','max_temp']]
train_x,test_x,train_y,test_y=train_test_split(x,y,test_size=0.3)
model=LinearRegression().fit(train_x,train_y)
text_y_pred=model.predict(test_x)
model
LinearRegression()
#查看回归系数于截距
a=model.coef_
b=model.intercept_
print("回归系数:",a,"截距:",b)
回归系数: [-0.5170021 2.34935887] 截距: 7.8441657954872355
model.score(test_x,test_y)
0.7098012480597483
模型所代表的关系是SOC = -0.53076915 行驶距离+2.50406686 电池温度+5.421381672633274;
不过温度对电量的影响是正向的,这与我们平时的认知是相悖的,说明模型还是有待修正的。
#混动汽车行驶距离和SOC变化关系
data_hybrid1[['distince','soc']].plot()
<AxesSubplot:>
data_hybrid1.describe()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | power | distince | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 430.0 | 430.000000 | 430.000000 | 430.000000 | 430.000000 | 430.000000 | 430.000000 | 430.000000 | 430.0 | 430.000000 | ... | 430.000000 | 430.000000 | 430.0 | 430.000000 | 430.000000 | 430.0 | 430.000000 | 430.000000 | 430.000000 | 430.000000 |
mean | 1.0 | 2.402326 | 1.669767 | 42.691860 | 69856.039535 | 351.138605 | -4.168140 | 38.295349 | 1.0 | 13.995349 | ... | 66.697674 | 3.650709 | 1.0 | 2.420930 | 27.688372 | 1.0 | 5.876744 | 25.660465 | -1531.102744 | 68.039535 |
std | 0.0 | 0.490938 | 0.557021 | 23.689193 | 38.456173 | 9.569201 | 23.681272 | 20.876900 | 0.0 | 0.096449 | ... | 24.787427 | 0.101638 | 0.0 | 1.331711 | 1.922897 | 0.0 | 1.073594 | 1.877624 | 8256.938343 | 38.456173 |
min | 1.0 | 2.000000 | 1.000000 | 0.000000 | 69792.000000 | 333.000000 | -108.000000 | 18.000000 | 1.0 | 13.000000 | ... | 6.000000 | 3.446000 | 1.0 | 2.000000 | 24.000000 | 1.0 | 5.000000 | 22.000000 | -38847.600000 | 4.000000 |
25% | 1.0 | 2.000000 | 1.000000 | 31.425000 | 69819.000000 | 345.000000 | -16.300000 | 23.000000 | 1.0 | 14.000000 | ... | 48.000000 | 3.587000 | 1.0 | 2.000000 | 27.000000 | 1.0 | 5.000000 | 25.000000 | -5761.837500 | 31.000000 |
50% | 1.0 | 2.000000 | 2.000000 | 43.150000 | 69872.000000 | 348.000000 | -3.600000 | 26.000000 | 1.0 | 14.000000 | ... | 68.000000 | 3.621000 | 1.0 | 2.000000 | 28.000000 | 1.0 | 5.000000 | 26.000000 | -1279.715000 | 84.000000 |
75% | 1.0 | 3.000000 | 2.000000 | 57.075000 | 69887.750000 | 355.500000 | 5.050000 | 58.000000 | 1.0 | 14.000000 | ... | 95.000000 | 3.699000 | 1.0 | 2.000000 | 29.000000 | 1.0 | 7.000000 | 27.000000 | 1782.212500 | 99.750000 |
max | 1.0 | 3.000000 | 3.000000 | 103.600000 | 69909.000000 | 380.700000 | 94.200000 | 87.000000 | 1.0 | 15.000000 | ... | 96.000000 | 3.974000 | 1.0 | 7.000000 | 32.000000 | 1.0 | 8.000000 | 31.000000 | 31839.600000 | 121.000000 |
8 rows × 28 columns
以上我们得知,混动汽车SOC与行驶距离几乎是呈负相关的,但中间经历了一段可能是行驶充电的过程。
#电量峰值的时间点
print(max(data_hybrid1['soc']))
data_hybrid1[data_hybrid1.soc==max(data_hybrid1['soc'])].time.unique()
87
array(['01-06 19 3', '01-06 19 4'], dtype=object)
#行驶里程最大的时间点
print(data_hybrid1['distince'].max())
data_hybrid1[data_hybrid1['distince']==data_hybrid1['distince'].max()].time.max()
121.0
'01-06 22 1'
#对应时间的行驶距离
data_hybrid1[data_hybrid1.time=='01-06 19 3'].distince.unique()
array([37., 38.])
#最大里程对应的SOC
data_hybrid1[data_hybrid1['distince']==data_hybrid1['distince'].max()].soc.max()
21
该混动汽车在[01-06 19 3]~[01-06 22 1]大约2.5h小时内行驶了121-37=84km,电量由87%降低至21%,平均每小时消耗26.4%的电量,
同时该混动汽车在此时间段内还消耗了3/100* 84=2.52(L)燃油,换算为百公里油耗3L+电耗78.5%;(84/2.5)*26.4
data_hybrid2 = data_hybrid[(data_hybrid['enginestatus']==1)&(data_hybrid['runmodel']==2)]
data_hybrid2
time | vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | ... | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | power | distince | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
23 | 01-06 15 3 | 1 | 3 | 2 | 0.0 | 69792.0 | 362.2 | 0.3 | 69 | 1 | ... | 96 | 3.769 | 1 | 7 | 25 | 1 | 8 | 24 | 108.66 | 4.0 |
24 | 01-06 15 3 | 1 | 3 | 2 | 0.0 | 69792.0 | 362.2 | 0.1 | 69 | 1 | ... | 96 | 3.769 | 1 | 7 | 25 | 1 | 8 | 24 | 36.22 | 4.0 |
25 | 01-06 15 3 | 1 | 3 | 2 | 7.4 | 69792.0 | 361.7 | 3.8 | 69 | 1 | ... | 96 | 3.763 | 1 | 7 | 25 | 1 | 8 | 24 | 1374.46 | 4.0 |
26 | 01-06 15 3 | 1 | 2 | 2 | 6.4 | 69792.0 | 363.0 | -1.8 | 69 | 1 | ... | 96 | 3.776 | 1 | 7 | 25 | 1 | 8 | 24 | -653.40 | 4.0 |
31 | 01-06 15 3 | 1 | 3 | 2 | 31.9 | 69792.0 | 362.0 | 9.1 | 70 | 1 | ... | 46 | 3.769 | 1 | 7 | 25 | 1 | 8 | 24 | 3294.20 | 4.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2281 | 01-06 22 1 | 1 | 3 | 2 | 37.1 | 69908.0 | 344.5 | 3.9 | 23 | 1 | ... | 48 | 3.582 | 1 | 2 | 29 | 1 | 5 | 27 | 1343.55 | 120.0 |
2290 | 01-06 22 1 | 1 | 2 | 2 | 36.5 | 69908.0 | 343.5 | -0.4 | 22 | 1 | ... | 95 | 3.569 | 1 | 2 | 29 | 1 | 5 | 27 | -137.40 | 120.0 |
2291 | 01-06 22 1 | 1 | 3 | 2 | 30.6 | 69908.0 | 342.0 | 10.5 | 22 | 1 | ... | 95 | 3.557 | 1 | 2 | 29 | 1 | 5 | 27 | 3591.00 | 120.0 |
2293 | 01-06 22 1 | 1 | 2 | 2 | 29.8 | 69908.0 | 345.0 | -8.6 | 22 | 1 | ... | 68 | 3.587 | 1 | 2 | 29 | 1 | 5 | 27 | -2967.00 | 120.0 |
2299 | 01-06 22 1 | 1 | 2 | 2 | 32.5 | 69909.0 | 343.2 | -1.5 | 21 | 1 | ... | 95 | 3.567 | 1 | 2 | 29 | 1 | 5 | 27 | -514.80 | 121.0 |
250 rows × 29 columns
#线性回归
y=data_hybrid2.soc
X=data_hybrid2[['distince','max_temp']]
train_x,test_x,train_y,test_y=train_test_split(X,y,test_size=0.3)
model=LinearRegression().fit(train_x,train_y)
test_y_pred=model.predict(test_x)
model
LinearRegression()
#查看回归系数于截距
a=model.coef_
b=model.intercept_
print("回归系数:",a,"截距:",b)
print('R2_score',model.score(test_x,test_y))
回归系数: [-0.5272625 2.48394717] 截距: 5.920065339559301
R2_score 0.6835670699977157
#混动汽车采取燃油模式行驶时的数据
data_hybrid3=data_hybrid[data_hybrid['runmodel']==3]
data_hybrid3.describe()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | power | distince | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 19.0 | 19.000000 | 19.0 | 19.000000 | 19.000000 | 19.000000 | 19.000000 | 19.000000 | 19.0 | 19.0 | ... | 19.000000 | 19.000000 | 19.0 | 19.0 | 19.000000 | 19.0 | 19.000000 | 19.000000 | 19.000000 | 19.000000 |
mean | 1.0 | 2.736842 | 3.0 | 76.863158 | 69868.684211 | 346.821053 | 2.089474 | 31.578947 | 1.0 | 14.0 | ... | 66.105263 | 3.606789 | 1.0 | 2.0 | 29.105263 | 1.0 | 6.157895 | 27.052632 | 721.411053 | 80.684211 |
std | 0.0 | 0.452414 | 0.0 | 8.513859 | 32.273230 | 1.280442 | 4.608795 | 8.738903 | 0.0 | 0.0 | ... | 23.860462 | 0.014831 | 0.0 | 0.0 | 1.882514 | 0.0 | 1.014515 | 1.899523 | 1592.191248 | 32.273230 |
min | 1.0 | 2.000000 | 3.0 | 62.700000 | 69806.000000 | 343.500000 | -3.700000 | 23.000000 | 1.0 | 14.0 | ... | 33.000000 | 3.567000 | 1.0 | 2.0 | 26.000000 | 1.0 | 5.000000 | 24.000000 | -1289.450000 | 18.000000 |
25% | 1.0 | 2.500000 | 3.0 | 70.600000 | 69857.500000 | 346.700000 | -1.050000 | 25.000000 | 1.0 | 14.0 | ... | 48.000000 | 3.604500 | 1.0 | 2.0 | 28.500000 | 1.0 | 5.000000 | 26.000000 | -363.885000 | 69.500000 |
50% | 1.0 | 3.000000 | 3.0 | 78.000000 | 69888.000000 | 347.000000 | 2.400000 | 25.000000 | 1.0 | 14.0 | ... | 48.000000 | 3.609000 | 1.0 | 2.0 | 29.000000 | 1.0 | 7.000000 | 27.000000 | 824.400000 | 100.000000 |
75% | 1.0 | 3.000000 | 3.0 | 80.600000 | 69895.000000 | 347.600000 | 3.250000 | 35.000000 | 1.0 | 14.0 | ... | 95.000000 | 3.615500 | 1.0 | 2.0 | 30.000000 | 1.0 | 7.000000 | 28.000000 | 1129.925000 | 107.000000 |
max | 1.0 | 3.000000 | 3.0 | 99.100000 | 69896.000000 | 348.500000 | 16.300000 | 46.000000 | 1.0 | 14.0 | ... | 95.000000 | 3.628000 | 1.0 | 2.0 | 32.000000 | 1.0 | 7.000000 | 30.000000 | 5615.350000 | 108.000000 |
8 rows × 28 columns
#混动汽车采取燃油模式行驶时行驶距离和SOC的变化关系
data_hybrid3[['distince','soc']].plot()
<AxesSubplot:>
混动汽车初期以某个固定速度行驶时,行驶距离均匀增大,SOC不变,后续提速时,SOC逐渐减小,我们可以尝试寻找一下此时对应的速度:
#电量开始下降时对应的时间
data_hybrid3[data_hybrid3['soc']==max(data_hybrid3['soc'])].time.max()
'01-06 20 1'
#电量开始下降时对应的速度
data_hybrid3[data_hybrid3.time=='2019-01-06 20:14:47'].speed
Series([], Name: speed, dtype: float64)
这里我们就可以合理地推断,100km/h对采取燃油模式行驶的混动行车来讲是一个临界值,速度超过100km/不仅需要发动机提供动力,同时还会消耗电量。
#混动汽车采取燃油模式行驶的时间跨度
print(data_hybrid3.time.min(),data_hybrid3.time.max())
01-06 16 4 01-06 21 4