Kettle study notes (7) - kettle process steps and application steps

I. Overview

  Process is mainly used to control data flow and data flow

  The application is to provide some tools

2. Process steps

  1. ETL metadata injection

    Similar to reflection in Java, it does not know the file name, file location, etc. at design time, and only knows some specific configuration and other information when it is actually executed.

    Specific introduction to follow-up supplements, official Wikia: https://wiki.pentaho.com/display/EAI/ETL+Metadata+Injection

  2. Data filtering

    

    Let's talk about Java code filtering, mainly some methods in Java, such as indexOf(), matches(), etc.:

    

  3. Handling indeterminate number of rows of data

    

    Detecting empty flow is used to detect whether the previous flow is empty; blocking data only needs the last line

  4. Multi-source data merging

    Use UNION for merging (the number of columns, column names, and column types need to be the same) instead of JOIN:

    

  5. Data flow endpoint

    

    No-ops are basically used for natural merges and bins;

    Abort can set the number of aborted records, for example, if there are 10 records, an error will be reported here

    Copying records to results can be temporarily stored in memory for later conversions

    Note that setting variables is also a setting variable that can only be used in subsequent conversions

  6. Other

    

  Single-threaded process, not expanded for the time being

3. Application

  1. Null value processing

    

  2. Start other programs

    

    // Note that password-free login must be set in advance to run SSH!

  3. Log function

    

   Writing a log is equivalent to log.info(...) in Java code, which is used to customize the log

   4. File processing function

    

  5. Send Email

    Separate multiple recipients with spaces (variables can be used)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325317427&siteId=291194637