I. Overview
Process is mainly used to control data flow and data flow
The application is to provide some tools
2. Process steps
1. ETL metadata injection
Similar to reflection in Java, it does not know the file name, file location, etc. at design time, and only knows some specific configuration and other information when it is actually executed.
Specific introduction to follow-up supplements, official Wikia: https://wiki.pentaho.com/display/EAI/ETL+Metadata+Injection
2. Data filtering
Let's talk about Java code filtering, mainly some methods in Java, such as indexOf(), matches(), etc.:
3. Handling indeterminate number of rows of data
Detecting empty flow is used to detect whether the previous flow is empty; blocking data only needs the last line
4. Multi-source data merging
Use UNION for merging (the number of columns, column names, and column types need to be the same) instead of JOIN:
5. Data flow endpoint
No-ops are basically used for natural merges and bins;
Abort can set the number of aborted records, for example, if there are 10 records, an error will be reported here
Copying records to results can be temporarily stored in memory for later conversions
Note that setting variables is also a setting variable that can only be used in subsequent conversions
6. Other
Single-threaded process, not expanded for the time being
3. Application
1. Null value processing
2. Start other programs
// Note that password-free login must be set in advance to run SSH!
3. Log function
Writing a log is equivalent to log.info(...) in Java code, which is used to customize the log
4. File processing function
5. Send Email
Separate multiple recipients with spaces (variables can be used)