Task 11 comprehensive exercises (2021.1)
Pandas Learning Manual
Write in front:
The pandas study in the past month is connected with numpy in November and LeetCode in January. I thought it was ordinary pandas study, so I just briefly reviewed it. Hey, guess what, I never expected it. What I learned before is known as knowing a term. With the help of the DW assistants, the great gods in the group, and the team leader Nan Nan, this little rookie has grown rapidly. It is no exaggeration to say that I learned this month. The things that accounted for more than 60% of the whole year of 2020, you can learn a lot just by not speaking in the group and looking at their problem-solving ideas. Of course, I also thank Mr. Geng for the open source learning materials. Yes, the blue link above (pandas learning manual), although it collapsed today
But it’s not a big problem. I have finished my studies (witness). Finally, thank you for your help. I’m very happy to meet everyone. Although my life has been rough during this period, I didn’t go wrong in my studies. Well, your help and encouragement from teaching assistants are indispensable. This is the end of this period, and at the same time, it is also the beginning of a new year of learning, everyone, come on together!
Study outline:
table of Contents
Task 11 comprehensive exercises (2021.1)
[Task 5] Feature engineering of hydraulic station
Exercise link: click here
import numpy as np
import pandas as pd
[Task 4] Graphics card log
The performance evaluation log results of the 3090 graphics card are given below. Each log has the following structure:
Benchmarking #2# #4# precision type #1# #1# model average #2# time : #3# ms
Among them, #1# represents the model name, #2# is the value of train(ing) or inference, which represents the training status or inference status, #3# represents time-consuming, #4# represents accuracy, which includes float, half, There are three types of double, the following is a specific example:
Benchmarking Inference float precision type resnet50 resnet50 model average inference time : 13.426570892333984 ms
Please organize the log results and transform them into the following state, model_i is filled with the corresponding model name, sorted in alphabetical order, and the value is kept to three decimal places:
[Data Download] Link: Click here to extract the code: 4mui
Deformation
Sort in alphabetical order, with three decimal places:
[Task 5] Feature engineering of hydraulic station
df1 and df2 give the data of each site in 18 and 19, respectively. H0 to H23 in the column represent 0 to 23:00 of the day; df3 records the daily weather conditions in the area from 18 to 19 years , Please complete the following tasks:
import pandas as pd import numpy as np df1 = pd.read_csv('yali18.csv') df2 = pd.read_csv('yali19.csv') df3 = pd.read_csv('qx1819.csv')
- Construct df through df1 and df2, set the time as the index, the first column is the site number, and the second column is the pressure at the corresponding time. The arrangement is as follows (please replace the pressure value with the correct value):
站点 压力 2018-01-01 00:00:00 1 1.0 2018-01-01 00:00:00 2 1.0 ... ... ... 2018-01-01 00:00:00 30 1.0 2018-01-01 01:00:00 1 1.0 2018-01-01 01:00:00 2 1.0 ... ... ... 2019-12-31 23:00:00 30 1.0
On the
df
basis of the previous structure, construct the following feature sequence orDataFrame
, and stitch them one by one todf
the right
- The highest and lowest temperature of the day and their temperature difference
- Whether there was a sandstorm, whether there was fog, whether there was rain, whether there was snow, whether it was sunny
- Choose an appropriate method to measure the amount of rainfall/snowfall (construct two series to represent the size of the two)
- Only use 4 columns to
0-1
encode the wind direction (only consider the wind direction, not the size)
Encountered a little problem, why there is no data here
There is also a problem here, I cried. . .
The right
df
water pressure column is constructed as follows:
- The difference between the water pressure at the site at the current time and the average water pressure at the site at the same hour of the month, for example, the current time is
2018-05-20 17:00:00
, then the corresponding value that needs to be subtracted is17:00:00
the average of the water pressure at all time points in the current month- The difference between the average water pressure of the site and the average water pressure of the working day on the weekend of the current week
- Within 7 days from the current moment, the mean, standard deviation,
0.95
quantile, sum of rainy days and snowy days at the site- Within 7 days from the current moment, the mean, standard deviation,
0.95
and quantile of the water pressure at the same hour at the site- The time difference between the highest and lowest water pressure at the site on the current day
[Data Download] Link: Click here to extract the code: ijbd
to sum up:
The pandas clocking in and learning is finished, but the pandas learning is not over. This will be a good start, come on!