Task 11 comprehensive exercises (2021.1)

Task 11 comprehensive exercises (2021.1)


Pandas Learning Manual


Write in front:

The pandas study in the past month is connected with numpy in November and LeetCode in January. I thought it was ordinary pandas study, so I just briefly reviewed it. Hey, guess what, I never expected it. What I learned before is known as knowing a term. With the help of the DW assistants, the great gods in the group, and the team leader Nan Nan, this little rookie has grown rapidly. It is no exaggeration to say that I learned this month. The things that accounted for more than 60% of the whole year of 2020, you can learn a lot just by not speaking in the group and looking at their problem-solving ideas. Of course, I also thank Mr. Geng for the open source learning materials. Yes, the blue link above (pandas learning manual), although it collapsed today

But it’s not a big problem. I have finished my studies (witness). Finally, thank you for your help. I’m very happy to meet everyone. Although my life has been rough during this period, I didn’t go wrong in my studies. Well, your help and encouragement from teaching assistants are indispensable. This is the end of this period, and at the same time, it is also the beginning of a new year of learning, everyone, come on together!

Study outline: 


table of Contents

Task 11 comprehensive exercises (2021.1)

Study outline

Exercise link: click here

[Task 4] Graphics card log

[Task 5] Feature engineering of hydraulic station


Exercise link: click here

import numpy as np
import pandas as pd

[Task 4] Graphics card log

The performance evaluation log results of the 3090 graphics card are given below. Each log has the following structure:

Benchmarking #2# #4# precision type #1#
#1#  model average #2# time :  #3# ms

 

Among them, #1# represents the model name, #2# is the value of train(ing) or inference, which represents the training status or inference status, #3# represents time-consuming, #4# represents accuracy, which includes float, half, There are three types of double, the following is a specific example:

Benchmarking Inference float precision type resnet50
resnet50  model average inference time :  13.426570892333984 ms

 

Please organize the log results and transform them into the following state, model_i is filled with the corresponding model name, sorted in alphabetical order, and the value is kept to three decimal places:

Insert picture description here

[Data Download] Link: Click here to  extract the code: 4mui

 Deformation

 Sort in alphabetical order, with three decimal places:

[Task 5] Feature engineering of hydraulic station

df1 and df2 give the data of each site in 18 and 19, respectively. H0 to H23 in the column represent 0 to 23:00 of the day; df3 records the daily weather conditions in the area from 18 to 19 years , Please complete the following tasks:

import pandas as pd
import numpy as np
df1 = pd.read_csv('yali18.csv')
df2 = pd.read_csv('yali19.csv')
df3 = pd.read_csv('qx1819.csv')

 

  • Construct df through df1 and df2, set the time as the index, the first column is the site number, and the second column is the pressure at the corresponding time. The arrangement is as follows (please replace the pressure value with the correct value):
                       站点    压力
2018-01-01 00:00:00       1    1.0
2018-01-01 00:00:00       2    1.0
...                     ...    ...
2018-01-01 00:00:00      30    1.0
2018-01-01 01:00:00       1    1.0
2018-01-01 01:00:00       2    1.0
...                     ...    ...
2019-12-31 23:00:00      30    1.0

On the dfbasis of the previous structure, construct the following feature sequence or DataFrame, and stitch them one by one to dfthe right

  • The highest and lowest temperature of the day and their temperature difference
  • Whether there was a sandstorm, whether there was fog, whether there was rain, whether there was snow, whether it was sunny
  • Choose an appropriate method to measure the amount of rainfall/snowfall (construct two series to represent the size of the two)
  • Only use 4 columns to 0-1encode the wind direction (only consider the wind direction, not the size)

 

Encountered a little problem, why there is no data here

 There is also a problem here, I cried. . .


The right dfwater pressure column is constructed as follows:

  • The difference between the water pressure at the site at the current time and the average water pressure at the site at the same hour of the month, for example, the current time is 2018-05-20 17:00:00, then the corresponding value that needs to be subtracted is 17:00:00the average of the water pressure at all time points in the current month
  • The difference between the average water pressure of the site and the average water pressure of the working day on the weekend of the current week
  • Within 7 days from the current moment, the mean, standard deviation, 0.95quantile, sum of rainy days and snowy days at the site
  • Within 7 days from the current moment, the mean, standard deviation, 0.95and quantile of the water pressure at the same hour at the site
  • The time difference between the highest and lowest water pressure at the site on the current day

[Data Download] Link: Click here to  extract the code: ijbd


to sum up:

The pandas clocking in and learning is finished, but the pandas learning is not over. This will be a good start, come on!

 

 

Guess you like

Origin blog.csdn.net/adminkeys/article/details/112554428