1. Shanghai Motor Vehicle License Auction Issues from 2002 to 2018
>>> import numpy as np
>>> import pandas as pd
>>> from IPython.core.interactiveshell import InteractiveShell
# 不用print,直接显示结果
>>> InteractiveShell.ast_node_interactivity = "all"
# 显示所有列
>>> pd.set_option('display.max_columns', 600)
# MVL = Motor Vehicle License
>>> MVL = pd.read_csv('General Exercises/2002年-2018年上海机动车牌照拍卖.csv')
>>> MVL.head()
(1) Which auction has the winning rate of less than 5% for the first time?
>>> MVL["ratio"] = MVL["Total number of license issued"]/MVL["Total number of applicants"]
>>> MVL.head()
>>> MVL[MVL["ratio"]<0.05]["Date"].values[0]
'15-May'
(3) Split the first column of time column into two columns, one is the year (format 20××), and the other is the month (English abbreviation), add it to the list as the first and second column, and add the original table The first column is deleted, and the other columns are extended backward.
>>> MVL["year"]= MVL["Date"].apply(lambda x:x.split("-")[0])
>>> MVL["month"] = MVL["Date"].apply(lambda x:x.split("-")[1])
>>> MVL["year"] = MVL["year"].apply(lambda x:"200"+x if len(x)==1 else "20"+x)
>>> MVL_new =MVL.reindex(columns=["year","month","Date","Total number of license issued","lowest price ","avg price","Total number of applicants","ratio"])
>>> MVL_new = MVL_new.drop(columns="Date")
>>> MVL_new.head()
(2) The following statistics of the lowest auction price are counted annually: the maximum value, the average value, and the 0.75 quantile, which are required to be displayed on the same table.
>>> from collections import OrderedDict
>>> groupedyear = MVL_new.groupby('year')
>>> def f(df):
>>> data = OrderedDict()
>>> data['LP_max'] = MVL["lowest price "].max()
>>> data['LP_mean'] = MVL['lowest price '].mean()
>>> data['LP_075'] = MVL['lowest price '].quantile(q=0.75)
>>> return pd.Series(data)
>>> groupedyear.apply(f)
LP_max LP_mean LP_075
year
2002 93500.0 53197.044335 77050.0
2003 93500.0 53197.044335 77050.0
2004 93500.0 53197.044335 77050.0
2005 93500.0 53197.044335 77050.0
2006 93500.0 53197.044335 77050.0
2007 93500.0 53197.044335 77050.0
2008 93500.0 53197.044335 77050.0
2009 93500.0 53197.044335 77050.0
2010 93500.0 53197.044335 77050.0
2011 93500.0 53197.044335 77050.0
2012 93500.0 53197.044335 77050.0
2013 93500.0 53197.044335 77050.0
2014 93500.0 53197.044335 77050.0
2015 93500.0 53197.044335 77050.0
2016 93500.0 53197.044335 77050.0
2017 93500.0 53197.044335 77050.0
2018 93500.0 53197.044335 77050.0
(4) Now set the table row index as a multi-level index, the outer layer is the year, the inner layer is the variable name of the second to fifth columns of the original table, and the column index is the month.
>>> Month = MVL_new.iloc[0:12,1].to_list()
>>> result = MVL_new.melt(id_vars=['year','month'],value_vars=['Total number of license issued','lowest price ','avg price','Total number of applicants'],value_name='info')
>>> result.pivot_table(index = ['year','variable'],columns='month',values='info',fill_value='-').reindex(columns = Month)
(5) Generally speaking, the difference between the lowest price of a certain month and the lowest price of the previous month will have the same sign as the difference between the monthly average and the previous month's average. Which auction times do not have this feature?
>>> print('[最低价、均值]与上月差额不同号的有:')
>>> for index in MVL_new.index:
>>> try:
>>> signal = (MVL_new.loc[index,'lowest price ']- MVL_new.loc[index+1,'lowest price '])*\
(MVL_new.loc[index,'avg price'] - MVL_new.loc[index+1,'avg price'])
>>> if signal<0:
>>> print(MVL_new.loc[index+1,['year','month']])
>>> print('\n')
>>> except:
>>> break
[最低价、均值]与上月差额不同号的有:
year 2003
month Oct
Name: 21, dtype: object
year 2003
month Nov
Name: 22, dtype: object
year 2004
month Jun
Name: 29, dtype: object
year 2005
month Jan
Name: 36, dtype: object
year 2005
month Feb
Name: 37, dtype: object
year 2005
month Sep
Name: 44, dtype: object
year 2006
month May
Name: 52, dtype: object
year 2006
month Sep
Name: 56, dtype: object
year 2007
month Jan
Name: 60, dtype: object
year 2007
month Feb
Name: 61, dtype: object
year 2007
month Dec
Name: 71, dtype: object
year 2012
month Oct
Name: 128, dtype: object
(6) Define the difference between the issuance volume of a certain month and the average of the previous two months as the issuance gain, fill with 0 in the first two months, and find the time when the issuance gain extreme value appears.
>>> MVL2 = MVL_new.copy()
>>> MVL2['发行增益']=0
>>> for index in MVL2.index:
>>> if index<2:continue
>>> MVL2.loc[index,'发行增益']= MVL2.loc[index,'Total number of license issued']-(MVL2.loc[index-1,'Total number of license issued']+
>>> MVL2.loc[index-2,'Total number of license issued'])/2
>>> print("最小",MVL2.loc[MVL2["发行增益"] == MVL2["发行增益"].min()][['year','month']].head())
>>> print("最大",MVL2.loc[MVL2["发行增益"] == MVL2["发行增益"].max()][['year','month']].head())
最小 year month
74 2008 Apr
最大 year month
72 2008 Jan
Reference: https://github.com/datawhalechina/joyful-pandas
About Datawhale
Datawhale is an open source organization focusing on data science and AI. It brings together excellent learners from many universities and well-known companies in many fields, and brings together a group of team members with open source spirit and exploratory spirit. With the vision of "for the learner, grow with learners", Datawhale encourages true self-expression, openness and tolerance, mutual trust and mutual assistance, the courage to try and make mistakes, and the courage to take responsibility. At the same time, Datawhale uses the concept of open source to explore open source content, open source learning and open source solutions, empower talent training, help talent growth, and establish a connection between people and people, people and knowledge, people and enterprises, and people and the future.