Kettle Small note (1)

1 KETTLE Overview

Kettle is "KETTLE ETTL ENVIRONMENT" only acronym, which means it is designed to help you achieve ETTL needs: extract, transform, load and load data . Chinese name called kettle, as the project's main programmer Matt in a forum to say: hope to all kinds of data into a pot and then out in a specified format .

Kettle is a foreign open source ETL tool, written in pure Java; using breakthrough metadata-driven approach provides a powerful extract, transform, and load (ETL) functions; green without having to install, you can extract; can run on Windows, Linux a plurality of platforms and the like; the data extraction efficiency and stability.

Official website address

1.1 Origins

Kettle ETL is a tool written in Java, the main author is Matt Casters, in 2003 began the project, the latest stable version is 8.2.

In December 2005, Kettle from version 2.1 into the open field, until compliance with the LGPL version 4.1, version 4.2 from the start to comply with Apache Licence 2.0 protocol. Kettle in early 2006 joined the open-source BI company Pentaho, officially named: Pentaho Data Integeration, referred to as "PDI". Since September 20, 2017, Pentaho has been incorporated in the new company in the Hitachi Group: Hitachi Vantara.

In short, Kettle simplify the creation of a data warehouse, updating and maintenance, use Kettle can build a set of open source ETL solutions.

1.2 Architecture

Kettle is a component of the integrated system, comprising the following several main portion.

1.2.1 Spoon

Graphical interface tool (GUI mode), Spoon through a graphical interface to design Job and Transformation. You can also run directly on the Job and Transformation Spoon graphical interface.

1.2.2 Pan

Transformation actuator (command line), Pan Transformation for performing the terminal, no graphical interface.

1.2.3 Kitchen

Job actuator (command line), Kitchen Job for performing the terminal, no graphical interface.

1.2.4 Map

Embedded Web services for remote execution Job or Transformation, Kettle cluster is created by Carte.

Encrypt 1.2.5

Kettle command line tool string for encryption, such as: database connection parameters defined in the Job Transformation or encrypted.

1.3 Kettle usage scenarios

Many Kettle usage scenarios, not limited to these:

  • Data migration between applications or databases;
  • Export data from a database to a file;
  • Introducing large-scale data to the database;
  • Data cleaning;
  • Integrated applications.

1.4 Infrastructure

1.4.1 Job (Job)

Transformation is responsible for designing a good organization together and then complete a particular job.

Usually we need to put a big task into several logical isolation of Job, Job when these are completed, it means that the task is complete.

1.4.2 Transformation (conversion)

Container defines data operation, data from the operation data is input to the output of a process.

It will be understood to be assembled or a plurality of different data sources into a data line, and then finally output to a place (file, database, etc.).

Job smaller particle size and more than one container, we decompose the task into Job, then you need to be broken down into one or more Job Transformation, only a portion of each complete work Transformation.

1.4.3 Step (Step)

Minimum unit inside Transformation, each of Step perform a specific function.

1.4.4 Job Entry (physical work)

Job inside the execution unit, for each of the Entry Job specific function, such as: authentication table exists send mail.

Job may be performed by another Job or Transformation, that can be used as Transformation and Job Job Entry.

1.4.5 Data Stream (stream)

Step reflected flows between the data, into the input and output streams:

  • Input Stream: Step when entering a row stack;
  • Output Stream: Step leave the stack when a row.

1.4.6 Hop (connecting node)

For connecting the Transformation Step, or in connection Job Job the Entry, a data stream is a graphical representation, actually reflects the execution order.

  1. Transformation-Hop: mainly shows the flow of data, i.e. from the input conversion operation such as filtration, and finally to the output.
    • Hop a representative of one or more data flows between the Step two;
    • Hop always represented a step in the output stream and an input stream of a step.
  2. Job-Hop: the execution condition can be set:
    • Unconditional implementation;
    • When executed on a Job execution result is True;
    • When executed on a Job execution result is False.

Hop always represented a connection between two Job Entry, and can be the original Job Entry settings, a Job Entry unconditionally executed, until success or failure.

Some Thoughts 1.4.7 Kettle structure

I do not know how you have a kind of understanding by the above Kettle This architecture? Or you have not learned some ideas? Yes, explain your ability to understand very well, I need to pay you a compliment: know how to think is a good boy. If not, do not worry, maybe you usually have been in use, but has not learned it.

In fact, simplification of complex issues, the idea of ​​divide and rule. Will be a big task, broken down into different small tasks, then the small tasks into one small problem to be resolved, then reverse thinking or assembly, and finally accomplish the task.

You might not agree, say this is nonsense, so we usually are to solve the problem that way.

I would say, really necessarily. I have come across many such people, they also appear to follow the above steps to solve the problem, but they thought or idea is forward of. Of course not to say that when they encountered choose not to judge, touched eyes go black. But that lack the ability to abstract so little disadvantage children imagination. They can not always have things in common some abstract out the execution order of some abstract thing to the same level of things, and so on, so that is independent of things and things, only when they need to be connected in series. Instead, only the order of execution, so that between things and things closely, we can not be reused.

He said a lot, a little about the Ha! Almost confused around himself, and all speak a little Han! Packaging and reuse?

So, a good experience, the structural design of Kettle of it, the idea is the same, many can learn to use. Kettle overall structural design is completely on the use of "component" of the product mindset, also in line with the requirements of their own programmers - "Do not repeat create the wheel."

1.4.8 Kettle configuration diagram

Figure 1.4.9 official website presentation

Figure 1.4.10 presents practical applications

1.5 File Format

Design user-designed Transformation and Job Xml can be saved in a particular file or database.

Transformation design file ends .ktr, configured to save all database connections, a relative file path information, field mapping relationship.

Here is the beginning of a call already designed the conversion of the Transformation file:

<?xml version="1.0" encoding="UTF-8"?>
<transformation>
  <info>
    <name>CCTA&#x968f;&#x8bbf;&#x6570;&#x636e;&#x5bfc;&#x51fa;</name>
    <description/>
    <extended_description/>
    <trans_version/>
    <trans_type>Normal</trans_type>
    <trans_status>0</trans_status>
    <directory>&#x2f;</directory>
    <parameters>
        <parameter>
            <name>password</name>
            <default_value>Di13M&#x40;43</default_value>
            <description>&#x6570;&#x636e;&#x5e93;&#x5bc6;&#x7801;</description>
        </parameter>
</parameters>
复制代码

Job design documents to .kjb the end.

Reproduced in: https: //juejin.im/post/5cff457b6fb9a07ec9560421

Guess you like

Origin blog.csdn.net/weixin_34417635/article/details/93173723