Analysis of time series data ETT oil temperature data in Informer

Analysis of time series data ETT oil temperature data in Informer

Oil temperature data introduction

Power transformer oil temperature data, provided by State Grid, this data set is transformer data from two different counties in the same province in China, the time span is 2 years, the original data is recorded every minute (marked by m), each data Set contains 2 years * 365 days * 24 hours * 60 minutes = 1,051,200 data points. Due to the huge amount of data, the data granularity is changed to 1 data point every 15 minutes, which is recorded as ETTm1, ETTm2, or a dataset variant with hour-level granularity (marked with h), which is recorded as ETTh1, ETTh2.

Each data point contains 8-dimensional features, including data point record date, predicted target oil temperature and 6 different types of power load features.
insert image description here

Introduction to data problems

The power distribution problem is that the grid manages the distribution of power to different user areas according to sequentially changing demand. But it is difficult to predict the future demand of a specific user area because it changes with different factors such as weekdays, holidays, seasons, weather, temperature, etc. Existing forecasting methods cannot be applied to high-precision long-term forecasts for long-term real-world data, and any wrong forecasts may have serious consequences. As a result, there is currently no effective way to predict future electricity consumption, and managers are forced to make decisions based on empirical values ​​whose thresholds are often much higher than actual demand. Conservative strategies lead to unnecessary waste of electricity and depreciation of equipment. It is worth noting that the oil temperature of the transformer can effectively reflect the working condition of the power transformer. Therefore, predicting the oil temperature of the transformer can also try to avoid unnecessary waste.

This dataset can be used to predict the oil temperature of power transformers and study the ultimate load capacity of power transformers.

data analysis

Some information in the data can be seen through the generated exploratory analysis report:

  • **Summary:** data type, number of unique values, missing values, 0 values
  • **quantile statistics: **minimum, maximum, median, each quantile value
  • **Descriptive Statistics:** Mean, Mode, Standard Deviation, Absolute Median Difference, Coefficient of Variation, Peak, Coefficient of Skewness
  • **Number of occurrences of each value: **Displayed through a histogram
  • **Visualization of correlation analysis: **Draw interactive diagrams to understand the interactive relationship between variables, Spearman, Pearson and other matrix correlation color scale diagrams to highlight related variables

insert image description here

insert image description here

0.8-1.0 very strong correlation
0.6-0.8 strong correlation
0.4-0.6 moderate correlation
0.2-0.4 weak correlation
0.0-0.2 very weak correlation or no correlation

Through the analysis of various correlation diagrams, we can see the correlation between various variables, and the correlation can be divided into extremely strong, strong, medium, weak, and no correlation. Some specific correlations in this data are:

HUFL was strongly correlated with MUFL, weakly correlated with LUFL, and not correlated with the predictor variable OT.
HULL was strongly associated with MULL, moderately associated with LULL, and weakly associated with LUFL and the predictor variable OT.
MUFL was strongly correlated with HUFL, weakly correlated with MULL and LUFL, and not correlated with the predictor variable OT.
MULL is strongly correlated with HULL and weakly correlated with the predictor variable OT.
LUFL has a moderate correlation with LULL and no correlation with the predictor variable OT.
LULL was moderately correlated with LUFL and weakly correlated with the predictor variable OT.

| Correlation with OT | HUFL | HULL | HULL
|–|–|

Correlation with OT HUFL HULL MUFL MULL LUFL LULL
no correlation weak correlation no correlation weak correlation no correlation weak correlation

It can be seen that some correlations between input variables and output targets are strong or weak, and machine learning will carry out relevant "learning" for these relationships during subsequent training models.

Guess you like

Origin blog.csdn.net/m0_56075892/article/details/127824104