Table of contents
Dataset 1: GEFCom2014 load data
Dataset 2: iQiyi User Retention Prediction Challenge Dataset
Dataset 3: Power Transformer Dataset (ETDataset)
Dataset 4: 2016 Electrician Mathematical Modeling Competition Load Forecasting Dataset
Data set 5: Wind turbine operation data set
Dataset 6: Australian electricity load and price forecast data
Dataset 7: Changzhou Bridgestone Photovoltaic Dataset
Data set 8: Xinjiang photovoltaic wind power data set
Dataset 1: GEFCom2014 load data
Data set download:
- Baidu Netdisk: Link:https://pan.baidu.com/s/1PgCWHx8vYUfGB9UGtCmaVA?pwd=ktn0 Extraction code: ktn0
- Official download:https://ars.els-cdn.com/content/image/1-s2.0-S0169207016000133-mmc1.zip
Dataset introduction:
GEFCom2014 "Load Forecast Data" is the public data set of the competition, and the load forecast trajectory of GEFCom2014 is probabilistic load forecast. The visualization of the dataset is as follows:
Dataset 2: iQiyi User Retention Prediction Challenge Dataset
Data set download:
- Baidu Netdisk: Link:https://pan.baidu.com/s/1UQWmIN7P6vcBmYxfZxu_1g?pwd=5ywi Extraction code: 5ywi
- Official download:http://challenge.ai.iqiyi.com/detail?raceId=61600f6cef1b65639cd5eaa6
Competition title description:
iQiyi is the leading high-quality video entertainment streaming platform in China and the world. More than 500 million users enjoy entertainment services on iQiyi every month. iQiyi adheres to the brand slogan of "Enjoy Quality" and creates a professional and genuine video content library covering movies, TV series, variety shows, and animations, as well as massive user-generated content such as "Sui Ke" to provide users with a rich professional video experience .
The iQiyi mobile APP uses the latest AI technologies such as deep learning to enhance users’ personalized product experience and better allow users to enjoy customized entertainment services. We use the key indicator "N-day retention points" to measure user satisfaction. For example, if a user's "7-day retention score" on October 1st is equal to 3, it means that the user will visit the iQiyi APP on 3 days in the next 7 days (October 2nd to 8th). Predicting a user's retention score is a challenging problem: different users have very different preferences and activity levels. In addition, other factors such as the entertainment time at the user's disposal and the popularity of popular content also have strong cyclical characteristics.
This competition is based on the data information after desensitization and sampling of iQiyi APP to predict the user's 7-day retention score. Participating teams need to design corresponding algorithms for data analysis and prediction.
Data description:
This competition provides a rich data set, including video data, user portrait data, user startup logs, user viewing and interactive behavior logs, etc. For users in the test set, it is necessary to predict the "7-day retention score" of each user on a certain day. The 7-day retention score ranges from 0 to 7, and the prediction results are retained to 2 decimal places.
User portrait data | |
Field name |
Description |
user_id |
|
device_type |
iOS, Android |
device_rom |
rom of the device |
device_ram |
ram of the device |
sex |
|
age |
|
education |
|
occupation_status |
|
territory_code |
App launch logs | |
Field name |
Description |
user_id |
|
date |
Desensitization, started from 0 |
launch_type |
spontaneous or launched by other apps & deep-links |
Video related data | |
Field name |
Description |
item_id |
id of the video |
father_id |
album id, if the video is an episode of an album collection |
cast |
a list of actors/actresses |
duration |
video length |
tag_list |
a list of tags |
User playback data | |
Field name |
Description |
user_id |
|
item_id |
|
playtime |
video playback time |
date |
timestamp of the behavior |
User interaction data | |
Field name |
Description |
user_id |
|
item_id |
|
interact_type |
interaction types such as posting comments, etc. |
date |
timestamp of the behavior |
Dataset 3: Power Transformer Dataset (ETDataset)
Data set download:
Data description:
The data provides two years of data. Each data point is recorded every minute (marked with m ). They are from the same country in China. Two different regions in a province are named ETT-small-m1 and ETT-small-m2. Each dataset contains 2 years * 365 days * 24 hours * 4 = 70,080 data points. In addition, we also provide an hourly granularity of data set variants (marked with h ), namely ETT-small-h1 and ETT-small- h2. Each data point contains 8-dimensional features, including the recording date of the data point, the predicted value "oil temperature" and 6 different types of external load values.
Dataset 4: 2016 Electrician Mathematical Modeling Competition Load Forecasting Dataset
Data set download:
- Baidu Netdisk: Link:https://pan.baidu.com/s/1h3sGqebtLp1XGK9kmN53Zg?pwd=jx5p Extraction code: jx5p
Data introduction:
Data set 5: Wind turbine operation data set
Data set download:
- Baidu Netdisk: Link:https://pan.baidu.com/s/13-YLcvuOP6-ev6XdGtQbGQ?pwd=levp Extraction code: levp
Data introduction:
The data set includes more than 300,000 items including wind speed, wind direction, temperature, humidity, air pressure and real power.
- WINDSPEED: Forecast wind speed
- WINDDIRECTION: wind direction
- TEMPERATURE: temperature
- HUMIDITY: Humidity
- PRESSURE: air pressure
- PREPOWER: Predict power
- ROUND(A.WS,1): actual wind speed
- ROUND(A.POWER,0): actual power
- YD15: Actual power prediction target already available
Dataset 6: Australian electricity load and price forecast data
Data set download:
- Baidu Netdisk: Link:https://pan.baidu.com/s/1ehm9aJQqzbGOITnz3LwyLw?pwd=k4s1
Extraction code: k4s1
Data introduction:
The data set includes date, hour, dry bulb temperature, dew point temperature, wet bulb temperature, humidity, electricity price, and power load characteristics, with a time interval of 30 minutes.
Dataset 7: Changzhou Bridgestone Photovoltaic Dataset
Data set download:
- Baidu Netdisk: Link:https://pan.baidu.com/s/1vXRjtf2gen_-2b1jDwYLXA?pwd=loam Extraction code :loam
Data introduction:
The data set includes five features: time, station name, irradiation intensity (Wh/㎡), ambient temperature (℃), and full-field power (kW), with a time interval of 5 minutes. (Note: There is a space before the irradiation intensity (Wh/㎡), ambient temperature (℃), and full-field power (kW) feature names) a>
Data set 8: Xinjiang photovoltaic wind power data set
Data set download:
- Baidu Netdisk: Link: https://pan.baidu.com/s/1e3NkiNC_dg3CaZWe9TA1TA?pwd=loam Extraction code: loam
Introduction to photovoltaic data:
PhotovoltaicThe data set includes component temperature (℃), temperature (°), air pressure (hPa), humidity (%), total radiation (W/m2), direct radiation (W /m2), scattered radiation (W/m2), actual power generation (mw) characteristics, time interval 15min.