[Data warehouse] full-scale, table snapshots, incremental table, zipper tables, dimension tables, solid table, fact table

Original link:

https://blog.csdn.net/a6822342/article/details/100050548

https://blog.csdn.net/PTtaoge/article/details/80880494

https://blog.csdn.net/bjweimengshu/article/details/79256504

The total amount of table

Full Scale no partition, data in the table all the data the previous day, for example, today is the 24th, then the full scale of the data which have all the data number 23, which each time to full-scale write data before the data is overwritten , so the full scale data can not be recorded case history, only as of date, the total amount of data currently.

Snapshot table

Then to the historical data can be found in the case how should we do? This time it is put to use the snapshot table, the table is a snapshot of time partitions, each partition is a partition inside the data corresponding to all full-time amount of data the previous day, for example, the current data table has three partitions, No. 24, No. 25, No. 26. Among them, the data inside the partition number 24 is the history of all the data from No. 23, No. 25 partitions inside the data is all the data from the historical number of 24, and so on.

But this is also a problem that a large amount of data, when in fact each partition to store a lot of duplicate data, very wasteful of storage space.

Ever since, zip table came out.

Before introducing the zipper table, we first introduce the delta tables.

Incremental Table

Incremental table, the new table is to record data every day, for example, from No. 25 to No. 24 adds that data, what data has changed, these are stored in 25 increments the partition table inside. Said snapshot table above the partition 25 and the partition 24 (both t + 1, the actual time corresponding numbers 26 and 25), two of which data is to subtract the actual time of 25 to 26 there is a change, an increase data, it is equivalent to the incremental data table inside the 25 partition.

Zipper table

Zipper table, it is a maintenance history of the state, and a table-to-date data. Zipper also table partition table, some of the same data or has reached the end of the state of the data will put it inside a partition, partition field usually start time: start_date and end times: end_date. Generally valid data on that day, it is greater than or equal end_date date of the day. Acquiring a data amount Vincent, screening can be done by the table and END_DATE start_date, one day is selected fixed data. For example, I would like to take off the whole amount of data to 20,190,813, which is the condition where the filter where start_date <= '20190813' and end_date> = 20190813.

 

Dimension table

Dimension table can be seen as a user to analyze the facts of the window, it should be inside the data on all aspects of the facts described, such as the time dimension table, the data inside it is that some day, week, month, quarter, year, date and other data, a dimension table can only be analytically fact table.

Entity table

Entity table is a table, the actual table entity objects that put a section of data must be something objectively existing data, such as device, it is an objective reality, so its design can be a solid table.

Fact table

Its essence is the fact table is determined by the fact that a variety of dimensions and indicators worth combination, for example, by the time dimension, organizational dimension area, go to the index value can be determined how about some of the index value at a certain time of the facts. Each of the data are the data and fact table index value of the intersection of several dimension tables obtained.
 

What is a data warehouse

https://blog.csdn.net/bjweimengshu/article/details/79256504

https://blog.csdn.net/Su_Levi_Wei/article/details/89501304

 

 

Published 44 original articles · won praise 16 · views 10000 +

Guess you like

Origin blog.csdn.net/YYIverson/article/details/103716086