Currently, there is no data in finebi_shop_bi and it is an empty database. All subsequent data analysis will be conducted in this database. The first thing we need to do is to extract all the tables in the "finebi_shop" database into the "finebi_shop_bi" database. To extract and load data into "finebi_shop_bi", we first need to create the corresponding table in "finebi_shop_bi".
1 Data extraction business analysis
We are already roughly familiar with the six tables above. These six tables do not synchronize all data to the data warehouse intact at one time, but have some processing details. Consider the following business scenarios:
- Orders need to be analyzed every day, for example: how many orders were there on April 18, 2020, and what is the total amount of the orders.
- User analysis needs to be performed every day, for example: how many users were registered on April 18, 2020.
- The change rate of product categories and regions is very small, because the categories and regions almost remain unchanged all year round.
Product data changes relatively frequently because product information may be updated every day.
Combined with the above business scenarios, we can determine the data extraction cycle:
Table Name | illustrate | load table | Extraction method | Extraction cycle |
finebi_areas | Administrative area table | ods_ifinebi_areas | Full simultaneous extraction | weekly |
finebi_goods | Product list | ods_finebi_goods | Full simultaneous extraction | every day |
finebi_goods_cats | Product classification table | ods_finebi_goods_cats | Full simultaneous extraction | weekly |
finebi_orders | order form | ods_finebi_orders | Incremental synchronous extraction | every day |
finebi_order_goods | Order details | ods_finebi_order_goods | Incremental synchronous extraction | every day |
finebi_users | User information table | ods_finebi_users | Incremental synchronous extraction | every day |
- Full synchronous extraction: synchronously extract all data to the data warehouse
- Incremental synchronous extraction: only extract new data to the data warehouse
2 About the meaning of ods
ODS (English: Operational Business/Data/Store) is a concept of data architecture or database design. The reason for its emergence is when data from multiple systems needs to be integrated, and the results need to be used by one or more systems. .
The ods table of the data warehouse extracts the business system database table as it is. The structure is almost the same, except that a date field for extracting data is added.
3 Weekly data extraction job development
3.1 Develop administrative region data extraction
finebi_areas | Administrative area table | Full simultaneous extraction | weekly |
According to the previous analysis, the administrative region table is fully extracted simultaneously, so we only need to extract it all into the data warehouse to build the table. But please note: We need to clearly identify the day on which the data was extracted, so we need to add an additional field for the current date.
(1) Construct Kettle data flow component diagram
(2) Configuration table input component
To create a new database connection, click New
Configure database connection information
Get read table information
SELECT *, current_date() as dt FROM finebi_areas
By previewing the data, you can see that in addition to all the fields of the original finebi_areas table, a current date field has been added, which will later be used as the extraction date of the data.
(3) Configure insert/update components
Configure the connection to the data warehouse
Specify the target table. The name of the target table is: business system database table plus an ods_ prefix.
Click the "SQL" button to execute
perform conversion
View datasheet
(4) Build job, executed once a day
Create job
Configuration transformation
Configure scheduled operation: synchronize once a day at 00:05