PYTHON dimension table creation date

One definition of a data warehouse is a reflection of the historical changes, the data will contain more or less time characteristics, and therefore the date dimension to one of the warehouse has become indispensable dimension data, it can be said in any fact, there will be a table or a plurality date dimension foreign key. Date dimension can contain as many details of the date, such as day of the week, lunar calendar, year, month, day, holidays, quarterly, or even information on the Lunar New Year, Lunar Calendar, and other constellations.

There are many ways to generate date dimension table, for example, can directly generate MySQL database by date dimension, it can also be generated by programming language JAVA, PYTHON and so on. Employed herein is achieved by the generation date dimension table PYTHON, advantage of this approach is to generate a relatively detailed date dimension. First look at the final result

The meaning of each column is as follows

date id 2018-01-01
year year  2018
month  month 1
day day     1
Quarter quarter 1
Day of the week day_name Monday
The first few weeks of the year weekofyear 1
The first few days of the year dayofyear 1
There are a few days of the month daysinmonth 31
The first few days this week    dayofweek 1
It is a leap year is_leap_year FALSE
Whether the end of the last day is_month_end FALSE
Whether the first of the month is_month_start TRUE
Whether the end of the last day of the quarter is_quarter_end FALSE
Whether the first day of the beginning of the quarter is_quarter_start TRUE
Whether the end of the last day is_year_end FALSE
Whether the first day of the beginning is_year_start TRUE
Lunar calendar lunar_date The winter months fifteen
Lunar calendar gz_year Ding years
Lunar New Year sx_year Year of the Rooster
Zodiac 纪日 gz_day Kishi Date
Feasts solar_terms  
constellation zodiac Capricorn
Holidays holiday New Year's Day

Method to realize:

from util.lunar import Lunar
import pandas as pd
def getLunar(ct=None):
    ln = Lunar(ct)    
    return (ln.ln_date_str(), ln.gz_year(), ln.sx_year(), ln.gz_day(),ln.ln_jie())
def holiday(ln_date):
    n = ('春节','春节','春节','端午节','中秋节','元旦','劳动节','国庆节','国庆节','国庆节')
    d = ('腊月三十','正月初一','正月初二','五月初五','八月十五',(1,1),(5,1),(10,1),(10,2),(10,3))  
    dic = dict(zip(d,n))
    if ln_date in d:
        return dic[ln_date]
def zodiac(month, day):
    n = ('摩羯座','水瓶座','双鱼座','白羊座','金牛座','双子座','巨蟹座','狮子座','处女座','天秤座','天蝎座','射手座')
    d = ((1,20),(2,19),(3,21),(4,21),(5,21),(6,22),(7,23),(8,23),(9,23),(10,23),(11,23),(12,23))
    return n[len(list(filter(lambda y:y<=(month,day), d)))%12]
def generateData(startDate='2019-1-01', endDate='2019-1-31'):
    d = {'id':pd.date_range(start=startDate, end=endDate)}
    data = pd.DataFrame(d)
    data['year'] = data['id'].apply(lambda x:x.year)
    data['month'] = data['id'].apply(lambda x:x.month)
    data['day'] = data['id'].apply(lambda x:x.day)
    data['quarter'] = data['id'].apply(lambda x:x.quarter)
    data['day_name'] = data['id'].apply(lambda x:x.day_name())
    data['weekofyear'] = data['id'].apply(lambda x:x.weekofyear)
    data['dayofyear'] = data['id'].apply(lambda x:x.dayofyear)
    data['daysinmonth'] = data['id'].apply(lambda x:x.daysinmonth)
    data['dayofweek'] = data['id'].apply(lambda x:x.dayofweek)
    data['is_leap_year'] = data['id'].apply(lambda x:x.is_leap_year)
    data['is_month_end'] = data['id'].apply(lambda x:x.is_month_end)
    data['is_month_start'] = data['id'].apply(lambda x:x.is_month_start)
    data['is_quarter_end'] = data['id'].apply(lambda x:x.is_quarter_end)
    data['is_quarter_start'] = data['id'].apply(lambda x:x.is_quarter_start)
    data['is_year_end'] = data['id'].apply(lambda x:x.is_year_end)
    data['is_year_start'] = data['id'].apply(lambda x:x.is_year_start)
    data['lunar'] = data['id'].apply(lambda x:getLunar(x))
    data['lunar_date'] = data['lunar'].apply(lambda x:x[0])
    data['gz_year'] = data['lunar'].apply(lambda x:x[1])
    data['sx_year'] = data['lunar'].apply(lambda x:x[2])
    data['gz_day'] = data['lunar'].apply(lambda x:x[3])
    data['solar_terms'] = data['lunar'].apply(lambda x:x[4])
    data['zodiac'] = data['id'].apply(lambda x:(x.month,x.day))
    data['holiday0']= data['zodiac'].apply(lambda x:holiday(x))
    data['holiday1']= data['lunar_date'].apply(lambda x:holiday(x))
    data['zodiac'] = data['zodiac'].apply(lambda x:zodiac(x[0],x[1]))
    del data['lunar']
    return data
data =generateData(startDate='2018-1-01', endDate='2018-12-31')
data.to_csv('DIM_TIME.csv', index = False,index_label = False)

Finally, the resulting csv file into a database can

Reproduced in: https: //my.oschina.net/aubao/blog/3058272

Guess you like

Origin blog.csdn.net/weixin_34289454/article/details/91644280