Pandas 教程（上）综合练习

一、 2002 年-2018 年上海机动车牌照拍卖问题

>>> import numpy as np
>>> import pandas as pd
>>> from IPython.core.interactiveshell import InteractiveShell
# 不用print，直接显示结果
>>> InteractiveShell.ast_node_interactivity = "all"
# 显示所有列
>>> pd.set_option('display.max_columns', 600)

# MVL = Motor Vehicle License
>>> MVL = pd.read_csv('General Exercises/2002年-2018年上海机动车牌照拍卖.csv')
>>> MVL.head()

在这里插入图片描述
(1) 哪一次拍卖的中标率首次小于5%？

>>> MVL["ratio"] = MVL["Total number of license issued"]/MVL["Total number of applicants"]
>>> MVL.head()
>>> MVL[MVL["ratio"]<0.05]["Date"].values[0]

'15-May'

(3) 将第一列时间列拆分成两个列，一列为年份（格式为 20××），另一列为月份（英语缩写），添加到列表作为第一第二列，并将原表第一列删除，其他列依次向后顺延。

>>> MVL["year"]= MVL["Date"].apply(lambda x:x.split("-")[0])
>>> MVL["month"] = MVL["Date"].apply(lambda x:x.split("-")[1])
>>> MVL["year"] = MVL["year"].apply(lambda x:"200"+x if len(x)==1 else "20"+x)
>>> MVL_new =MVL.reindex(columns=["year","month","Date","Total number of license issued","lowest price ","avg price","Total number of applicants","ratio"])
>>> MVL_new = MVL_new.drop(columns="Date")
>>> MVL_new.head()

在这里插入图片描述
(2) 按年统计拍卖最低价的下列统计量：最大值、均值、 0.75 分位数，要求显示在同一张表上。

>>> from collections import OrderedDict
>>> groupedyear = MVL_new.groupby('year')
>>> def f(df):
>>>     data = OrderedDict()
>>>     data['LP_max']  = MVL["lowest price "].max()
>>>     data['LP_mean'] = MVL['lowest price '].mean()
>>>     data['LP_075']  = MVL['lowest price '].quantile(q=0.75)
>>>     return pd.Series(data)
>>> groupedyear.apply(f)
       LP_max       LP_mean   LP_075
year                                
2002  93500.0  53197.044335  77050.0
2003  93500.0  53197.044335  77050.0
2004  93500.0  53197.044335  77050.0
2005  93500.0  53197.044335  77050.0
2006  93500.0  53197.044335  77050.0
2007  93500.0  53197.044335  77050.0
2008  93500.0  53197.044335  77050.0
2009  93500.0  53197.044335  77050.0
2010  93500.0  53197.044335  77050.0
2011  93500.0  53197.044335  77050.0
2012  93500.0  53197.044335  77050.0
2013  93500.0  53197.044335  77050.0
2014  93500.0  53197.044335  77050.0
2015  93500.0  53197.044335  77050.0
2016  93500.0  53197.044335  77050.0
2017  93500.0  53197.044335  77050.0
2018  93500.0  53197.044335  77050.0

(4) 现在将表格行索引设为多级索引，外层为年份，内层为原表格第二至第五列的变量名，列索引为月份。

>>> Month = MVL_new.iloc[0:12,1].to_list()
>>> result = MVL_new.melt(id_vars=['year','month'],value_vars=['Total number of license issued','lowest price ','avg price','Total number of applicants'],value_name='info')
>>> result.pivot_table(index = ['year','variable'],columns='month',values='info',fill_value='-').reindex(columns = Month)

在这里插入图片描述
(5) 一般而言某个月最低价与上月最低价的差额，会与该月均值与上月均值的差额具有相同的正负号，哪些拍卖时间不具有这个特点？

>>> print('[最低价、均值]与上月差额不同号的有：')
>>> for index in MVL_new.index:
>>>     try:
>>>         signal = (MVL_new.loc[index,'lowest price ']- MVL_new.loc[index+1,'lowest price '])*\
                 (MVL_new.loc[index,'avg price'] - MVL_new.loc[index+1,'avg price'])
>>>         if signal<0:
>>>             print(MVL_new.loc[index+1,['year','month']])
>>>             print('\n')
>>>     except:
>>>         break

[最低价、均值]与上月差额不同号的有：
year     2003
month     Oct
Name: 21, dtype: object
year     2003
month     Nov
Name: 22, dtype: object
year     2004
month     Jun
Name: 29, dtype: object
year     2005
month     Jan
Name: 36, dtype: object
year     2005
month     Feb
Name: 37, dtype: object
year     2005
month     Sep
Name: 44, dtype: object


year     2006
month     May
Name: 52, dtype: object
year     2006
month     Sep
Name: 56, dtype: object
year     2007
month     Jan
Name: 60, dtype: object
year     2007
month     Feb
Name: 61, dtype: object
year     2007
month     Dec
Name: 71, dtype: object
year     2012
month     Oct
Name: 128, dtype: object

(6) 将某一个月牌照发行量与其前两个月发行量均值的差额定义为发行增益，最初的两个月用 0 填充，求发行增益极值出现的时间。

>>> MVL2 = MVL_new.copy()
>>> MVL2['发行增益']=0
>>> for index in MVL2.index:
>>>     if index<2:continue
>>>     MVL2.loc[index,'发行增益']= MVL2.loc[index,'Total number of license issued']-(MVL2.loc[index-1,'Total number of license issued']+
                                                                             >>> MVL2.loc[index-2,'Total number of license issued'])/2
>>> print("最小",MVL2.loc[MVL2["发行增益"] == MVL2["发行增益"].min()][['year','month']].head())
>>> print("最大",MVL2.loc[MVL2["发行增益"] == MVL2["发行增益"].max()][['year','month']].head())

最小     year month
74  2008   Apr
最大     year month
72  2008   Jan

参考：https://github.com/datawhalechina/joyful-pandas

关于Datawhale

Datawhale是一个专注于数据科学与AI领域的开源组织，汇集了众多领域院校和知名企业的优秀学习者，聚合了一群有开源精神和探索精神的团队成员。Datawhale以“for the learner，和学习者一起成长”为愿景，鼓励真实地展现自我、开放包容、互信互助、敢于试错和勇于担当。同时Datawhale 用开源的理念去探索开源内容、开源学习和开源方案，赋能人才培养，助力人才成长，建立起人与人，人与知识，人与企业和人与未来的联结。

Pandas 教程（上）综合练习

一、 2002 年-2018 年上海机动车牌照拍卖问题

猜你喜欢