One definition of a data warehouse is a reflection of the historical changes, the data will contain more or less time characteristics, and therefore the date dimension to one of the warehouse has become indispensable dimension data, it can be said in any fact, there will be a table or a plurality date dimension foreign key. Date dimension can contain as many details of the date, such as day of the week, lunar calendar, year, month, day, holidays, quarterly, or even information on the Lunar New Year, Lunar Calendar, and other constellations.
There are many ways to generate date dimension table, for example, can directly generate MySQL database by date dimension, it can also be generated by programming language JAVA, PYTHON and so on. Employed herein is achieved by the generation date dimension table PYTHON, advantage of this approach is to generate a relatively detailed date dimension. First look at the final result
The meaning of each column is as follows
date | id | 2018-01-01 |
year | year | 2018 |
month | month | 1 |
day | day | 1 |
Quarter | quarter | 1 |
Day of the week | day_name | Monday |
The first few weeks of the year | weekofyear | 1 |
The first few days of the year | dayofyear | 1 |
There are a few days of the month | daysinmonth | 31 |
The first few days this week | dayofweek | 1 |
It is a leap year | is_leap_year | FALSE |
Whether the end of the last day | is_month_end | FALSE |
Whether the first of the month | is_month_start | TRUE |
Whether the end of the last day of the quarter | is_quarter_end | FALSE |
Whether the first day of the beginning of the quarter | is_quarter_start | TRUE |
Whether the end of the last day | is_year_end | FALSE |
Whether the first day of the beginning | is_year_start | TRUE |
Lunar calendar | lunar_date | The winter months fifteen |
Lunar calendar | gz_year | Ding years |
Lunar New Year | sx_year | Year of the Rooster |
Zodiac 纪日 | gz_day | Kishi Date |
Feasts | solar_terms | |
constellation | zodiac | Capricorn |
Holidays | holiday | New Year's Day |
Method to realize:
from util.lunar import Lunar
import pandas as pd
def getLunar(ct=None):
ln = Lunar(ct)
return (ln.ln_date_str(), ln.gz_year(), ln.sx_year(), ln.gz_day(),ln.ln_jie())
def holiday(ln_date):
n = ('春节','春节','春节','端午节','中秋节','元旦','劳动节','国庆节','国庆节','国庆节')
d = ('腊月三十','正月初一','正月初二','五月初五','八月十五',(1,1),(5,1),(10,1),(10,2),(10,3))
dic = dict(zip(d,n))
if ln_date in d:
return dic[ln_date]
def zodiac(month, day):
n = ('摩羯座','水瓶座','双鱼座','白羊座','金牛座','双子座','巨蟹座','狮子座','处女座','天秤座','天蝎座','射手座')
d = ((1,20),(2,19),(3,21),(4,21),(5,21),(6,22),(7,23),(8,23),(9,23),(10,23),(11,23),(12,23))
return n[len(list(filter(lambda y:y<=(month,day), d)))%12]
def generateData(startDate='2019-1-01', endDate='2019-1-31'):
d = {'id':pd.date_range(start=startDate, end=endDate)}
data = pd.DataFrame(d)
data['year'] = data['id'].apply(lambda x:x.year)
data['month'] = data['id'].apply(lambda x:x.month)
data['day'] = data['id'].apply(lambda x:x.day)
data['quarter'] = data['id'].apply(lambda x:x.quarter)
data['day_name'] = data['id'].apply(lambda x:x.day_name())
data['weekofyear'] = data['id'].apply(lambda x:x.weekofyear)
data['dayofyear'] = data['id'].apply(lambda x:x.dayofyear)
data['daysinmonth'] = data['id'].apply(lambda x:x.daysinmonth)
data['dayofweek'] = data['id'].apply(lambda x:x.dayofweek)
data['is_leap_year'] = data['id'].apply(lambda x:x.is_leap_year)
data['is_month_end'] = data['id'].apply(lambda x:x.is_month_end)
data['is_month_start'] = data['id'].apply(lambda x:x.is_month_start)
data['is_quarter_end'] = data['id'].apply(lambda x:x.is_quarter_end)
data['is_quarter_start'] = data['id'].apply(lambda x:x.is_quarter_start)
data['is_year_end'] = data['id'].apply(lambda x:x.is_year_end)
data['is_year_start'] = data['id'].apply(lambda x:x.is_year_start)
data['lunar'] = data['id'].apply(lambda x:getLunar(x))
data['lunar_date'] = data['lunar'].apply(lambda x:x[0])
data['gz_year'] = data['lunar'].apply(lambda x:x[1])
data['sx_year'] = data['lunar'].apply(lambda x:x[2])
data['gz_day'] = data['lunar'].apply(lambda x:x[3])
data['solar_terms'] = data['lunar'].apply(lambda x:x[4])
data['zodiac'] = data['id'].apply(lambda x:(x.month,x.day))
data['holiday0']= data['zodiac'].apply(lambda x:holiday(x))
data['holiday1']= data['lunar_date'].apply(lambda x:holiday(x))
data['zodiac'] = data['zodiac'].apply(lambda x:zodiac(x[0],x[1]))
del data['lunar']
return data
data =generateData(startDate='2018-1-01', endDate='2018-12-31')
data.to_csv('DIM_TIME.csv', index = False,index_label = False)
Finally, the resulting csv file into a database can
Reproduced in: https: //my.oschina.net/aubao/blog/3058272