Comprehensive Exercises of Pandas Tutorial (Part 1)

1. Shanghai Motor Vehicle License Auction Issues from 2002 to 2018

>>> import numpy as np
>>> import pandas as pd
>>> from IPython.core.interactiveshell import InteractiveShell
# 不用print,直接显示结果
>>> InteractiveShell.ast_node_interactivity = "all"
# 显示所有列
>>> pd.set_option('display.max_columns', 600) 
# MVL = Motor Vehicle License
>>> MVL = pd.read_csv('General Exercises/2002年-2018年上海机动车牌照拍卖.csv')
>>> MVL.head()

Insert picture description here
(1) Which auction has the winning rate of less than 5% for the first time?

>>> MVL["ratio"] = MVL["Total number of license issued"]/MVL["Total number of applicants"]
>>> MVL.head()
>>> MVL[MVL["ratio"]<0.05]["Date"].values[0]

'15-May'

(3) Split the first column of time column into two columns, one is the year (format 20××), and the other is the month (English abbreviation), add it to the list as the first and second column, and add the original table The first column is deleted, and the other columns are extended backward.

>>> MVL["year"]= MVL["Date"].apply(lambda x:x.split("-")[0])
>>> MVL["month"] = MVL["Date"].apply(lambda x:x.split("-")[1])
>>> MVL["year"] = MVL["year"].apply(lambda x:"200"+x if len(x)==1 else "20"+x)
>>> MVL_new =MVL.reindex(columns=["year","month","Date","Total number of license issued","lowest price ","avg price","Total number of applicants","ratio"])
>>> MVL_new = MVL_new.drop(columns="Date")
>>> MVL_new.head()

Insert picture description here
(2) The following statistics of the lowest auction price are counted annually: the maximum value, the average value, and the 0.75 quantile, which are required to be displayed on the same table.

>>> from collections import OrderedDict
>>> groupedyear = MVL_new.groupby('year')
>>> def f(df):
>>>     data = OrderedDict()
>>>     data['LP_max']  = MVL["lowest price "].max()
>>>     data['LP_mean'] = MVL['lowest price '].mean()
>>>     data['LP_075']  = MVL['lowest price '].quantile(q=0.75)
>>>     return pd.Series(data)
>>> groupedyear.apply(f)
       LP_max       LP_mean   LP_075
year                                
2002  93500.0  53197.044335  77050.0
2003  93500.0  53197.044335  77050.0
2004  93500.0  53197.044335  77050.0
2005  93500.0  53197.044335  77050.0
2006  93500.0  53197.044335  77050.0
2007  93500.0  53197.044335  77050.0
2008  93500.0  53197.044335  77050.0
2009  93500.0  53197.044335  77050.0
2010  93500.0  53197.044335  77050.0
2011  93500.0  53197.044335  77050.0
2012  93500.0  53197.044335  77050.0
2013  93500.0  53197.044335  77050.0
2014  93500.0  53197.044335  77050.0
2015  93500.0  53197.044335  77050.0
2016  93500.0  53197.044335  77050.0
2017  93500.0  53197.044335  77050.0
2018  93500.0  53197.044335  77050.0

(4) Now set the table row index as a multi-level index, the outer layer is the year, the inner layer is the variable name of the second to fifth columns of the original table, and the column index is the month.

>>> Month = MVL_new.iloc[0:12,1].to_list()
>>> result = MVL_new.melt(id_vars=['year','month'],value_vars=['Total number of license issued','lowest price ','avg price','Total number of applicants'],value_name='info')
>>> result.pivot_table(index = ['year','variable'],columns='month',values='info',fill_value='-').reindex(columns = Month)

Insert picture description here
(5) Generally speaking, the difference between the lowest price of a certain month and the lowest price of the previous month will have the same sign as the difference between the monthly average and the previous month's average. Which auction times do not have this feature?

>>> print('[最低价、均值]与上月差额不同号的有:')
>>> for index in MVL_new.index:
>>>     try:
>>>         signal = (MVL_new.loc[index,'lowest price ']- MVL_new.loc[index+1,'lowest price '])*\
                 (MVL_new.loc[index,'avg price'] - MVL_new.loc[index+1,'avg price'])
>>>         if signal<0:
>>>             print(MVL_new.loc[index+1,['year','month']])
>>>             print('\n')
>>>     except:
>>>         break

[最低价、均值]与上月差额不同号的有:
year     2003
month     Oct
Name: 21, dtype: object
year     2003
month     Nov
Name: 22, dtype: object
year     2004
month     Jun
Name: 29, dtype: object
year     2005
month     Jan
Name: 36, dtype: object
year     2005
month     Feb
Name: 37, dtype: object
year     2005
month     Sep
Name: 44, dtype: object


year     2006
month     May
Name: 52, dtype: object
year     2006
month     Sep
Name: 56, dtype: object
year     2007
month     Jan
Name: 60, dtype: object
year     2007
month     Feb
Name: 61, dtype: object
year     2007
month     Dec
Name: 71, dtype: object
year     2012
month     Oct
Name: 128, dtype: object

(6) Define the difference between the issuance volume of a certain month and the average of the previous two months as the issuance gain, fill with 0 in the first two months, and find the time when the issuance gain extreme value appears.

>>> MVL2 = MVL_new.copy()
>>> MVL2['发行增益']=0
>>> for index in MVL2.index:
>>>     if index<2:continue
>>>     MVL2.loc[index,'发行增益']= MVL2.loc[index,'Total number of license issued']-(MVL2.loc[index-1,'Total number of license issued']+
                                                                             >>> MVL2.loc[index-2,'Total number of license issued'])/2
>>> print("最小",MVL2.loc[MVL2["发行增益"] == MVL2["发行增益"].min()][['year','month']].head())
>>> print("最大",MVL2.loc[MVL2["发行增益"] == MVL2["发行增益"].max()][['year','month']].head())

最小     year month
74  2008   Apr
最大     year month
72  2008   Jan

Reference: https://github.com/datawhalechina/joyful-pandas

About Datawhale

Datawhale is an open source organization focusing on data science and AI. It brings together excellent learners from many universities and well-known companies in many fields, and brings together a group of team members with open source spirit and exploratory spirit. With the vision of "for the learner, grow with learners", Datawhale encourages true self-expression, openness and tolerance, mutual trust and mutual assistance, the courage to try and make mistakes, and the courage to take responsibility. At the same time, Datawhale uses the concept of open source to explore open source content, open source learning and open source solutions, empower talent training, help talent growth, and establish a connection between people and people, people and knowledge, people and enterprises, and people and the future.

Guess you like

Origin blog.csdn.net/OuDiShenmiss/article/details/105883771