[Transfer] Popular Science | What is a wide watch?

Popular Science | What is a wide table? One article will help you understand

Data warehouse wide table_data wide table_Lu Guichen0's blog-CSDN blog

1. What is a "wide table"?

"Wide table" literally means a database table with more fields (columns). It connects and assembles multiple data tables related to business topics into one large table through associated fields to realize attribute information of different dimensions of business entities . Unified storage.

For example, to carry out real estate registration information inquiry business, it is necessary to obtain information such as the right holder, certificate number, real estate title certificate number, location, planned use, property nature, building area, mortgage registration status, etc. In the real estate database, the above information may be distributed in multiple database tables such as the home buyer information table, the natural building attribute table, the household attribute table, the real estate ownership attribute table, and the mortgage ownership table. Every time the user queries, he or she needs to query across the above multiple data tables one by one according to the related fields. This operation is very time-consuming in the database.

                                                        Multi-table association query graph (simulated data)

The wide table is to extract and assemble relevant business data tables in advance according to the needs of real estate registration data query business, and concatenate them into an information integration table with the right person as the entity, including multiple information such as person, house, land, etc. Therefore, real estate registration data query can be completed in only a single table .

                                                               Real estate owner wide table (simulated data)

Two interpretations of wide tables can be obtained:

2. 存放核心业务实体不同维度属性的数据库表,可以称之为宽表

3. 存放核心业务实体在业务履行流程中的信息&上下游的关联信息,可以称之为宽表

2. Why use wide tables?

0 1. Wide table query is more convenient

Wide tables handle the relationships between multiple tables in advance. Queryers can perform data analysis without knowing the relationships between database tables. This also avoids logical errors that may occur when performing related queries.

02. Wide table query is more efficient

The purpose of wide table design is to improve query efficiency by placing related fields in the same database table, avoiding a large number of association connections and increasing query efficiency. For example: In the PG database, there are more than 10 million pieces of data. Using wide tables can improve query efficiency by about 25 times.

                                                                        Query efficiency comparison

0 3. The wide table has richer information

The design of wide tables does not need to follow the three paradigms of the database, but according to the application requirements of the theme or topic , all types of information such as indicators, dimensions, attributes, etc. related to the entity objects are stored in the same table, usually as a data warehouse. DWS (summary data layer).

For example, in the wide table of real estate rights holders mentioned above, in addition to adding database tables in the real estate registration database, you can also add rental information, household registration information, income information, etc. according to business needs to facilitate subsequent business inquiries and online data analysis (OLAP). ), data distribution, data mining, etc. provide the basis.

3. Problems with wide tables

01. Data redundancy

When multi-table associations generate wide tables, there may be a one-to-many relationship, resulting in redundant data. For example: there is an area field in the household table. When the house purchaser table and the household table are related through the unit number, the household table has a one-to-many relationship, so the area field in the wide table will be repeated. If it is done directly through the area field Summary statistics will be wrong.

                                                                 wide table redundancy

02. High maintenance costs

After a wide table is generated by multi-table association, the fields in the wide table may increase or decrease according to changes in business scenarios. In addition, data inconsistencies between business database tables and wide table data may occur as data is updated.

4. How to optimize wide tables?

01. Partition table (Hash points)

In actual applications, due to the continuous iteration of business data, the wide table may reach tens of millions or even hundreds of millions of rows. The storage pressure increases and the performance is greatly affected. At this time, a partition table method needs to be used to divide a table into multiple Small parts reduce the storage pressure caused by data redundancy and improve query throughput.

For example, using unique ID, business primary key, administrative division and other fields, the data table is split into multiple sub-tables to achieve physical partitioning of the data. For example, the real estate query wide table is split into three database tables according to ID serial numbers. During the query, multiple sub-tables are queried simultaneously, making the query faster and improving the efficiency of wide table query.

                                                                                wide table partition

0 2. Index optimization

If the database table is compared to a book, creating a database index is equivalent to creating a table of contents for the book. Creating an index can locate the target data, thereby achieving the purpose of rapid retrieval. For queries involving any combination of fields that are widely used in wide tables, independent indexes for each column are used to ensure balanced read and write performance. For example, in the PG database, after optimizing index creation for wide tables, the performance is improved by about 2 times.

                                                        Efficiency comparison after index optimization

0 3. Use columnar storage

Using column storage (such as Hbase) is beneficial to data compression of wide tables and can also improve data reading efficiency. The reasons are: first, the data type of the same column in the database table is consistent, and the compression algorithm can achieve a higher compression rate; second, the data read each time by column storage is a section or the entire set, which can improve reading efficiency.

5. Wide table application scenarios

6. How to build a wide table

1. Select the business process for which you want to build a wide table and sort out all the activities in the business process; 2. Sort out the
core business entities involved in these activities;
3. Select the entities that the business is most concerned about to build a wide table;
4. Determine the wide table Data granularity;
5. Select attributes
    1. Attributes related to core entities
    2. Attributes of upstream and downstream related core entities (select as needed)
    3. First related dimension attributes (time, location, customer, product, etc.) (according to Need to select)
    4. Statistics label (select as needed)
6. Design Mapping & test cases
7. Implementation, testing, online, regression testing
 

Guess you like

Origin blog.csdn.net/eylier/article/details/129669321