Zipper address information table on the use and design

Use description and design of the address information of the fastener Table

• one on fastener Table
o 1) Zipper table definition:
O 2) address information zipper Table:
O. 3) the address information of the fastener table used reason
o 4) zipper table data example
• Second, the use zipper tables:
o 1) query is valid data
o 2) query historical snapshot data
• Third, the table field
• Fourth, the address information zipper design of the table
data o 1) zipper table design of the presentation
o 2) build a table in SQL
a zippers table describes the
table name: dm_gis.addr_info
1) zipper table definition:
zipper table is designed for the way the data warehouse tables to store data and define the name suggests, the so-called zipper, is to record history. A record of things from the beginning, all the way to the current state of the life cycle.
2) the address information of the fastener table:
the address of the table is taken from the table width waybill, rds disaster recovery service call, the information stored in the appropriate fields, when used by citycode be uniquely associated address and
the stored address information is basic, such as provincial snapshot information within the city code, outlets, unit area, such as latitude and longitude information, and for some period of each record, when we use, you can get the latest data and historical data in this table the day before, the table is saved the most recent three-month snapshot of data. Use this table to note the start and end dates of filter condition.
3) address information is used to cause the fastener table
1, due to frequent access to information networks and to other cell region address, so when we run was repeated using the disaster recovery environment rds, resulting in excessive pressure disaster recovery environment, it is necessary to store this information in an intermediate table, i.e. address information Daily update the table.
2, the source table data from AWB wide table, and AWB is updated daily wide table, the address will be a substantial duplicate of different dates, if just taken daily updates the table, it is difficult to obtain a more comprehensive address data, and need to take a lot of data duplication partition to use
3, zipper design table, either directly currently valid address information for all addresses within a cycle, you can also get a daily snapshot of information in all addresses within a period of the cycle, while the data does not cross partition, the amount of data will not be much, more convenient to use.
4) zipper exemplary table data
in the table information is mainly used for off-line data associated with the address, the original table more fields, the figure illustrates only the part of the field.
end_date is 20,991,231 represents address is currently in force, such other date as represent a snapshot of historical information is
no address citycode zc tc start_date end_date Note
1 gram of Shanghai Baoshan District of Shanghai Road 50, Lane 307 021 021SC 021SC021 20,191,101 3 20,991,231  
2 Shanghai Songjiang area new Town nine new Highway No. 2888 Shen new Plaza, 5th floor, building 021 021LM 021LM008 20,191,101 20,191,102 3  
3 Shanghai Baoshan District, Shanghai Baoshan District, a total area of eight villages Tonghe River Road, room 103 184 021 021S 021S040 20191101 20191103
4 new Town nine new Highway No. 2888 Shen new Plaza, 5th floor, building 3, Shanghai Songjiang District of Shanghai 021 021LM 021LM008 20191102 20191103
5 results Shanghai Baoshan District, Shanghai 021SS 021SS004 20,191,102 20,991,231 022 21  
6 Shanghai Baoshan District, Shanghai Baoshan District, a total area of eight villages Tonghe River Road, Room 103, No. 021 021S 021M103 20,191,103 20,991,231 184
7 Shanghai Baoshan 021 021SG 021SG061 20191103 20991231 Dachang town Zhenhua Road No. 999 waterfront blue bridge, building 66, room 1105,
8 new town nine new Highway No. 2888 Shen Shanghai's Songjiang District, Shanghai Plaza, 5th floor, building 3 021 021LM 021SKM312 20191103 20991231
Second, the use zipper tables:
1) valid query data
for all records check valid address (end_date = '20991231' representation in the period, the value can not be changed)
the SELECT * from dm_gis.addr_info the WHERE END_DATE = '20,991,231'
2) query history snapshot data
query address number 20191102 historical snapshot data (Note: start_date and end_date two conditions here inquiry is necessary)
the SELECT * from the User the wHERE start_date <= '20191102' and end_date> = '20191102'
Third, the table field
dm_gis.ADDR_INFO
Province String Province
citycode string city code
address string address
src string Source
zc string outlets
tc string unit area
aoiid String aoiid
aoicode String aoicode
key_word String subject word
groupid string large set of ID
ID String Group ID
Level String Level
filter String filter
id_x String group address longitude
id_y string group address latitude
standardization string large set of standardized address
key string  
split_result string cut word
start_date string start date
end_date string end date, 20,991,231 represents the validity of other values represent a snapshot of history
-------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ---------------
Fourth, the address information table zipper design
data zipper design of the demonstration table 1)
data No. 20,191,101 table
no address citycode zc tc start_date end_date Note
1 gram of Shanghai Baoshan District of Shanghai Road 50, Lane 3 307 021 021SC 021SC021 20191101 20991231 initialization data
2 New Town nine new Highway No. 2888 Shen Shanghai's Songjiang District, Shanghai Plaza, Building 3 5th floor, 021 021LM 021LM008 20191101 20991231 initialization data
3 Shanghai Baoshan District, Shanghai Baoshan District, a total area of eight villages Tonghe River Road, room 103 184 021 021S 021S040 20191101 20991231 initialization data
data No. 20191102 zipper table
No. 4 and 5 the new update data table, wherein the fastener 4 in the data table already exists (No. 2), and therefore the number of data labeled history data 2, i.e. end_date ,, the address is updated to 20191102 4 commencement of data
no address citycode zc tc start_date end_date Note
1 gram of Shanghai Baoshan District of Shanghai road 50, Lane 307 021 021SC 021SC021 20,191,101 3 20,991,231  
2 new Town nine new highways Shanghai Songjiang District, Shanghai 2888 Shanghai new Plaza, 5th floor, building 3 021 021LM 021LM008 20191101 20191102 end_date changes to historical data
3 Shanghai Baoshan District, Shanghai Baoshan District, a total area roads lead to the river Eight villages, Room 103 184 021 021S 021S040 20191101 20991231  
45 Floor, New Town nine new Highway Plaza, Building 3, 2888 Shanghai New Shanghai Songjiang District, Shanghai No. 021 021LM 021LM008 20191102 20991231 20191102 new data, there is the history of
5 Results Road, Shanghai Baoshan District, Shanghai No. 21 022 021SS 021SS004 2,019,110,220,991,231 No. 20191102 new data, the history does not exist in
the data of table No. 20191103 fastener
678 to update the new table data, and 6 and 8 corresponding to the recorded history, 34, 34 is thus updated END_DATE
no address citycode zc tc start_date end_date Note
1 gram of Shanghai Baoshan District of Shanghai road 50, Lane 307 021 021SC 021SC021 20,191,101 3 20,991,231  
2 new Town nine new highways Shanghai Songjiang District, Shanghai 2888 Shanghai new Plaza, building 35 F, 021 021LM008 20,191,101 20,191,102 021LM  
3 Shanghai Baoshan District, Shanghai Baoshan District, a total area of eight villages Tonghe River Road, room 103 184 021 021S 021S040 20191101 20191103 end_date changes to historical data
4 Shanghai Songjiang District of Shanghai new Town nine new Highway 2888 building on the 3rd, 5th floor, Plaza Shen new 021 021LM 021LM008 20191102 20191103 end_date changes to historical data
5 Road, Baoshan District, Shanghai Shanghai performance No. 21 022 021SS 021SS004 20191102 20991231  
6 Shanghai Baoshan District, Shanghai Baoshan District, a total area of eight villages Tonghe River Road, Room 103 184 021 021S 021M103 20191103 20991231 No. 20,191,103 new data, there is a history of
7 Dachang Town Zhenhua Road, Baoshan District, Shanghai waterfront blue bridge No. 999, building 66, room 1105, No. 021 021SG 021SG061 20191103 20991231 20191103 new data, there is no history
8 new Town nine new Highway No. 2888 Shen new Plaza, 5th floor, building 3, Shanghai Songjiang District of Shanghai 021 021LM 021SKM312 2,019,110,320,991,231 No. 20191103 new data, history exists
fetch effective address data, the address will not be repeated
SELECT * WHERE from dm_gis.addr_info END_DATE = '20,991,231'
NO address citycode ZC START_DATE END_DATE TC Notes
1 g Road Shanghai Baoshan 50, Lane 307 021 021SC 021SC021 20,191,101 3 20,991,231  
5 results Shanghai Baoshan District, Shanghai 021SS 021SS004 20,191,102 20,991,231 022 21  
6 Shanghai Baoshan District, Shanghai Baoshan District, a total area of eight villages Tonghe River Road, room 184 021 103 021M103 20,191,103 20,991,231 021S  
7 Shanghai Dachang town, Baoshan District, Shanghai Zhenhua Road, No. 999 waterfront blue bridge, Building 66, room 1105 0 21 021SG 021SG061 20191103 20991231  
8 Shanghai, 5th Floor, New Town nine new Highway Plaza, Building 3, 2888 Shanghai New Songjiang District, Shanghai, 021 021LM 021SKM312 20191103 20991231  
take a snapshot of the data number 20191102, address not repeat the
select * from user where start_date <= '20191102' and end_date > = '20,191,102'
NO address citycode ZC START_DATE END_DATE TC Notes
1 g Baoshan District East Road, Lane 50 307 021 021SC 021SC021 20,191,101. 3 20,991,231  
. 3 Baoshan District Shanghai Baoshan total area River Road r. 184, room 103, eight villages 021S 021S040 20,191,101 20,191,103 021  
4 5 floor, building 3, new Town nine new highways Shanghai Songjiang District of Shanghai Shen new 2888 square 021LM 021LM008 20,191,102 20,191,103 021  
5 performance Road, Baoshan District, Shanghai No. 21022 021SS004 20,191,102 20,991,231 021SS  
2) built form the SQL
- daily update of the address information table, partition tables
(the data acquired by the offline program waybill width table data source, obtain calling rds disaster, regularly updated daily)
Create table IF Not EXISTS dm_gis.addr_info_update(
Province the Comment String 'province',
citycode the Comment String 'city Code'
address string comment 'address',
the src String Comment 'source',
ZC String Comment 'dot',
TC String Comment 'cell region',
aoiid String Comment 'aoiid',
aoicode String Comment 'aoicode',
key_word String Comment 'subject word' ,
groupId String Comment 'large group ID',
ID String Comment 'group ID',
Level String Comment 'Level',
filter String Comment 'filter',
id_x String Comment 'group address Lon',
id_y String Comment 'group address latitude',
standardization string comment 'large set of standardized address',
Key Comment String '',
split_result Comment String 'cut word'
) Comment 'of the address table information is updated daily'partitioned by (inc_day string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;
- Zipper address information table, maintaining an address information recording month, non-partitioned table
the Create the Table not IF EXISTS dm_gis.addr_info(
Province the Comment String 'province',
citycode the Comment String 'City Code',
address the Comment String 'address',
src the Comment String' source ',
ZC String Comment' dot ',
TC String Comment' cell region ',
aoiid String Comment' aoiid ',
aoicode String Comment' aoicode ',
key_word String Comment' subject word ',
groupId String Comment' large group ID ',
ID string comment 'group ID',
Level string comment 'Level',
filter string comment 'filter',
id_x string comment 'group address longitude',
id_y string comment 'latitude group address',
standardization string comment 'large set of standardized address',
Key Comment String '',
split_result string comment 'cutting word',
start_date String the Comment 'start date',
END_DATE String the Comment 'end date, 20,991,231 represents the validity of other values represent a historical snapshot'
'zipper table address information of') comment row format delimited fields terminated by '\ T' Lines terminated by '\ n-' Stored AS textfile;
- initialization fastener table data, using the data update No. 20191102
INSERT Overwrite table dm_gis.addr_info
SELECT c.province, c.citycode, c.address, c.src , c.zc, c.tc, c.aoiid, c.aoicode, c.key_word,
c.groupid, c.ID, c.level, c.filter, c.id_x, c.id_y, c.standardization, C .key, c.split_result,
c.inc_day AS START_DATE,
'20,991,231' AS END_DATE
from dm_gis.addr_info_update WHERE inc_day = C '20,191,102' and address <> '' and citycode <> ''

-- 每日更新拉链表SQL,保留最近一个月的快照信息
set mapreduce.job.queuename=gis;
set hive.execution.engine = tez;
insert overwrite table dm_gis.addr_info
select * from
(
select a.province,a.citycode,a.address,a.src,a.zc,a.tc,a.aoiid,a.aoicode,a.key_word,
a.groupid,a.id,a.level,a.filter,a.id_x,a.id_y,a.standardization,a.key,a.split_result,
a.start_date,
case when a.end_date = '20991231' and b.address is not null then b.inc_day else a.end_date end as end_date
from
(select * from dm_gis.addr_info where start_date >= '20191003' and address is not null and address <> '' and citycode <> '') as a
left join
(select * from dm_gis.addr_info_update where inc_day='20191104' and address is not null and address <> '' and citycode <> '') as b
on a.address =b.address
union
select c.province,c.citycode,c.address,c.src,c.zc,c.tc,c.aoiid,c.aoicode,c.key_word,
c.groupid,c.id,c.level,c.filter,c.id_x,c.id_y,c.standardization,c.key,c.split_result,
c.inc_day as start_date,
'20991231' as end_date
from dm_gis.addr_info_update c where inc_day='20191104' and address is not null and address <> '' and citycode <> ''1) table fastener to fall back to the previous operation state dataheavy run data table fastener latest one operation, the data herein by No. 20191104 exampleis sometimes encountered a problem with the data update table, then update the table can be deleted directly regenerate the partition, the partition table and the table is not the fastener, the need to fall back to a state before the operation, and then table data update date one day and join- data re-run
) AS T




insert overwrite table dm_gis.addr_info
select province,citycode,address,src,zc,tc,aoiid,aoicode,key_word,
groupid,id,level,filter,id_x,id_y,standardization,key,split_result,
start_date,
case when end_date ='20191104' and address is not null then '20991231' else end_date end as end_date
from dm_gis.addr_info where start_date <> '20191104'
2)执行上面每日更新拉链表的sql
--------------end--------------------

Guess you like

Origin www.cnblogs.com/david227/p/12408747.html