FineBI practical project one (3): Kettle implements ETL to data warehouse

Currently, there is no data in finebi_shop_bi and it is an empty database. All subsequent data analysis will be conducted in this database. The first thing we need to do is to extract all the tables in the "finebi_shop" database into the "finebi_shop_bi" database. To extract and load data into "finebi_shop_bi", we first need to create the corresponding table in "finebi_shop_bi".

1 Data extraction business analysis

We are already roughly familiar with the six tables above. These six tables do not synchronize all data to the data warehouse intact at one time, but have some processing details. Consider the following business scenarios:

  1. Orders need to be analyzed every day, for example: how many orders were there on April 18, 2020, and what is the total amount of the orders.
  2. User analysis needs to be performed every day, for example: how many users were registered on April 18, 2020.
  3. The change rate of product categories and regions is very small, because the categories and regions almost remain unchanged all year round.

Product data changes relatively frequently because product information may be updated every day.

Combined with the above business scenarios, we can determine the data extraction cycle:

Table Name illustrate load table Extraction method Extraction cycle
finebi_areas Administrative area table ods_ifinebi_areas Full simultaneous extraction weekly
finebi_goods Product list ods_finebi_goods Full simultaneous extraction every day
finebi_goods_cats Product classification table ods_finebi_goods_cats Full simultaneous extraction weekly
finebi_orders order form ods_finebi_orders Incremental synchronous extraction every day
finebi_order_goods Order details ods_finebi_order_goods Incremental synchronous extraction every day
finebi_users User information table ods_finebi_users Incremental synchronous extraction every day
  • Full synchronous extraction: synchronously extract all data to the data warehouse
  • Incremental synchronous extraction: only extract new data to the data warehouse

2 About the meaning of ods

ODS (English: Operational Business/Data/Store) is a concept of data architecture or database design. The reason for its emergence is when data from multiple systems needs to be integrated, and the results need to be used by one or more systems. .

The ods table of the data warehouse extracts the business system database table as it is. The structure is almost the same, except that a date field for extracting data is added.

3 Weekly data extraction job development

3.1 Develop administrative region data extraction

finebi_areas Administrative area table Full simultaneous extraction weekly

According to the previous analysis, the administrative region table is fully extracted simultaneously, so we only need to extract it all into the data warehouse to build the table. But please note: We need to clearly identify the day on which the data was extracted, so we need to add an additional field for the current date.

(1) Construct Kettle data flow component diagram

(2) Configuration table input component

To create a new database connection, click New

Configure database connection information

Get read table information

SELECT *, current_date() as dt FROM finebi_areas

By previewing the data, you can see that in addition to all the fields of the original finebi_areas table, a current date field has been added, which will later be used as the extraction date of the data.

(3) Configure insert/update components

Configure the connection to the data warehouse

Specify the target table. The name of the target table is: business system database table plus an ods_ prefix.

Click the "SQL" button to execute

perform conversion

View datasheet

(4) Build job, executed once a day

Create job

Configuration transformation

Configure scheduled operation: synchronize once a day at 00:05

Guess you like

Origin blog.csdn.net/u013938578/article/details/135439150