一、 2002 年-2018 年上海机动车牌照拍卖问题
>>> import numpy as np
>>> import pandas as pd
>>> from IPython.core.interactiveshell import InteractiveShell
# 不用print,直接显示结果
>>> InteractiveShell.ast_node_interactivity = "all"
# 显示所有列
>>> pd.set_option('display.max_columns', 600)
# MVL = Motor Vehicle License
>>> MVL = pd.read_csv('General Exercises/2002年-2018年上海机动车牌照拍卖.csv')
>>> MVL.head()
(1) 哪一次拍卖的中标率首次小于5%?
>>> MVL["ratio"] = MVL["Total number of license issued"]/MVL["Total number of applicants"]
>>> MVL.head()
>>> MVL[MVL["ratio"]<0.05]["Date"].values[0]
'15-May'
(3) 将第一列时间列拆分成两个列,一列为年份(格式为 20××),另一列为月份(英语缩写),添加到列表作为第一第二列,并将原表第一列删除,其他列依次向后顺延。
>>> MVL["year"]= MVL["Date"].apply(lambda x:x.split("-")[0])
>>> MVL["month"] = MVL["Date"].apply(lambda x:x.split("-")[1])
>>> MVL["year"] = MVL["year"].apply(lambda x:"200"+x if len(x)==1 else "20"+x)
>>> MVL_new =MVL.reindex(columns=["year","month","Date","Total number of license issued","lowest price ","avg price","Total number of applicants","ratio"])
>>> MVL_new = MVL_new.drop(columns="Date")
>>> MVL_new.head()
(2) 按年统计拍卖最低价的下列统计量:最大值、均值、 0.75 分位数,要求显示在同一张表上。
>>> from collections import OrderedDict
>>> groupedyear = MVL_new.groupby('year')
>>> def f(df):
>>> data = OrderedDict()
>>> data['LP_max'] = MVL["lowest price "].max()
>>> data['LP_mean'] = MVL['lowest price '].mean()
>>> data['LP_075'] = MVL['lowest price '].quantile(q=0.75)
>>> return pd.Series(data)
>>> groupedyear.apply(f)
LP_max LP_mean LP_075
year
2002 93500.0 53197.044335 77050.0
2003 93500.0 53197.044335 77050.0
2004 93500.0 53197.044335 77050.0
2005 93500.0 53197.044335 77050.0
2006 93500.0 53197.044335 77050.0
2007 93500.0 53197.044335 77050.0
2008 93500.0 53197.044335 77050.0
2009 93500.0 53197.044335 77050.0
2010 93500.0 53197.044335 77050.0
2011 93500.0 53197.044335 77050.0
2012 93500.0 53197.044335 77050.0
2013 93500.0 53197.044335 77050.0
2014 93500.0 53197.044335 77050.0
2015 93500.0 53197.044335 77050.0
2016 93500.0 53197.044335 77050.0
2017 93500.0 53197.044335 77050.0
2018 93500.0 53197.044335 77050.0
(4) 现在将表格行索引设为多级索引,外层为年份,内层为原表格第二至第五列的变量名,列索引为月份。
>>> Month = MVL_new.iloc[0:12,1].to_list()
>>> result = MVL_new.melt(id_vars=['year','month'],value_vars=['Total number of license issued','lowest price ','avg price','Total number of applicants'],value_name='info')
>>> result.pivot_table(index = ['year','variable'],columns='month',values='info',fill_value='-').reindex(columns = Month)
(5) 一般而言某个月最低价与上月最低价的差额,会与该月均值与上月均值的差额具有相同的正负号,哪些拍卖时间不具有这个特点?
>>> print('[最低价、均值]与上月差额不同号的有:')
>>> for index in MVL_new.index:
>>> try:
>>> signal = (MVL_new.loc[index,'lowest price ']- MVL_new.loc[index+1,'lowest price '])*\
(MVL_new.loc[index,'avg price'] - MVL_new.loc[index+1,'avg price'])
>>> if signal<0:
>>> print(MVL_new.loc[index+1,['year','month']])
>>> print('\n')
>>> except:
>>> break
[最低价、均值]与上月差额不同号的有:
year 2003
month Oct
Name: 21, dtype: object
year 2003
month Nov
Name: 22, dtype: object
year 2004
month Jun
Name: 29, dtype: object
year 2005
month Jan
Name: 36, dtype: object
year 2005
month Feb
Name: 37, dtype: object
year 2005
month Sep
Name: 44, dtype: object
year 2006
month May
Name: 52, dtype: object
year 2006
month Sep
Name: 56, dtype: object
year 2007
month Jan
Name: 60, dtype: object
year 2007
month Feb
Name: 61, dtype: object
year 2007
month Dec
Name: 71, dtype: object
year 2012
month Oct
Name: 128, dtype: object
(6) 将某一个月牌照发行量与其前两个月发行量均值的差额定义为发行增益,最初的两个月用 0 填充,求发行增益极值出现的时间。
>>> MVL2 = MVL_new.copy()
>>> MVL2['发行增益']=0
>>> for index in MVL2.index:
>>> if index<2:continue
>>> MVL2.loc[index,'发行增益']= MVL2.loc[index,'Total number of license issued']-(MVL2.loc[index-1,'Total number of license issued']+
>>> MVL2.loc[index-2,'Total number of license issued'])/2
>>> print("最小",MVL2.loc[MVL2["发行增益"] == MVL2["发行增益"].min()][['year','month']].head())
>>> print("最大",MVL2.loc[MVL2["发行增益"] == MVL2["发行增益"].max()][['year','month']].head())
最小 year month
74 2008 Apr
最大 year month
72 2008 Jan
参考:https://github.com/datawhalechina/joyful-pandas
关于Datawhale
Datawhale是一个专注于数据科学与AI领域的开源组织,汇集了众多领域院校和知名企业的优秀学习者,聚合了一群有开源精神和探索精神的团队成员。Datawhale以“for the learner,和学习者一起成长”为愿景,鼓励真实地展现自我、开放包容、互信互助、敢于试错和勇于担当。同时Datawhale 用开源的理念去探索开源内容、开源学习和开源方案,赋能人才培养,助力人才成长,建立起人与人,人与知识,人与企业和人与未来的联结。