ETL tool - JAVA calls Kettle transformation, job script

1. JAVA calls Kettle conversion

Before writing Javathe program, first use Spoonto design the conversion process, here is an example of pulling CSDNthe article list into txtthe text:

The pulled interface ishttps://blog.csdn.net/community/home-api/v1/get-business-list?page=1&size=20&businessType=blog&orderby=&noMore=false&year=&month=&username=qq_43692950

The return format is as follows:

{
    
    
    "code": 200,
    "message": "success",
    "traceId": "b1e8ccb0-2e39-4834-bacd-b52a260bb521",
    "data": {
    
    
        "list": [
            {
    
    
                "articleId": 130450076,
                "title": "ETL工具 - Kettle 查询、连接、统计、脚本算子介绍",
                "description": "连接算子一般将多个数据集通过关键字进行连接,类似 `SQL` 中的连接操作,统计算子可以提供数据的采样和统计功能,脚本算子可以通过程序代码完成一些复杂的操作",
                "url": "https://xiaobichao.blog.csdn.net/article/details/130450076",
                "type": 1,
                "top": false,
                "forcePlan": false,
                "viewCount": 313,
                "commentCount": 0,
                "editUrl": "https://editor.csdn.net/md?articleId=130450076",
                "postTime": "2023-04-30 23:12:13",
                "diggCount": 1,
                "formatTime": "前天 23:12",
                "picList": [
                    "https://img-blog.csdnimg.cn/2e817e14046f4cba9663c89978198f12.png"
                ]
            }
        ],
        "total": 287
    }
}

1.1 Converting the design process

Here urlit is passed in in the form of variables, and the overall conversion design is as follows:

insert image description here

Get variables:

insert image description here

REST client:

insert image description here
JSON input:

insert image description here
insert image description here

Field selection:

insert image description here
Text file output:

insert image description here

After designing, save ktrthe script:

insert image description here

1.2 Java call conversion script

Create a new Mavneproject and pomintroduce the following dependencies into it:

<dependencies>
    <dependency>
        <groupId>pentaho-kettle</groupId>
        <artifactId>kettle-core</artifactId>
        <version>9.6.0.0-SNAPSHOT</version>
    </dependency>

    <dependency>
        <groupId>pentaho-kettle</groupId>
        <artifactId>kettle-engine</artifactId>
        <version>9.6.0.0-SNAPSHOT</version>
    </dependency>

    <dependency>
        <groupId>org.pentaho.di.plugins</groupId>
        <artifactId>pdi-core-plugins-impl</artifactId>
        <version>9.6.0.0-SNAPSHOT</version>
    </dependency>

    <dependency>
        <groupId>pentaho</groupId>
        <artifactId>pentaho-capability-manager</artifactId>
        <version>9.6.0.0-SNAPSHOT</version>
        <scope>compile</scope>
    </dependency>

    <dependency>
        <groupId>commons-cli</groupId>
        <artifactId>commons-cli</artifactId>
        <version>1.3.1</version>
    </dependency>

    <dependency>
        <groupId>com.sun.jersey.contribs</groupId>
        <artifactId>jersey-apache-client4</artifactId>
        <version>1.9.1</version>
    </dependency>

    <dependency>
        <groupId>com.sun.jersey</groupId>
        <artifactId>jersey-core</artifactId>
        <version>1.19.1</version>
    </dependency>

    <dependency>
        <groupId>com.sun.jersey</groupId>
        <artifactId>jersey-client</artifactId>
        <version>1.19.1</version>
    </dependency>

    <dependency>
        <groupId>com.sun.jersey</groupId>
        <artifactId>jersey-bundle</artifactId>
        <version>1.19.1</version>
    </dependency>
</dependencies>
<repositories>
    <repository>
        <id>pentaho-public</id>
        <name>Pentaho Public</name>
        <url>https://repo.orl.eng.hitachivantara.com/artifactory/pnt-mvn/</url>
        <releases>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>
        </releases>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>interval:15</updatePolicy>
        </snapshots>
    </repository>
</repositories>

resourcesCreate a new file under kettle-password-encoder-plugins.xml, with the following content:

<password-encoder-plugins>
    <password-encoder-plugin id="Kettle">     
        <description>Kettle Password Encoder</description>    
        <classname>org.pentaho.di.core.encryption.KettleTwoWayPasswordEncoder</classname>
    </password-encoder-plugin>
</password-encoder-plugins>

JavaCall logic:

public class RunTrans {
    
    

    public static void main(String[] args) {
    
    
        try {
    
    
            // 指定插件位置,注意改为你的安装目录
            StepPluginType.getInstance().getPluginFolders().
                    add(new PluginFolder("D:/data-integration_9_3/plugins/", false, true));
            // 初始化 kettle 环境
            KettleEnvironment.init();
        } catch (KettleException e) {
    
    
            e.printStackTrace();
        }
        String ktrPath = "D:/data/job/trans.ktr";
        String url = "https://blog.csdn.net/community/home-api/v1/get-business-list?page=1&size=20&businessType=blog&orderby=&noMore=false&year=&month=&username=qq_43692950";
        // 添加变量
        Map<String, String> variableMap = new HashMap<>();
        variableMap.put("url", url);
        Boolean res = runTrans(ktrPath, variableMap, null);
        System.out.println("转换执行结果:" + res);
    }

    private static Boolean runTrans(String ktrPath, Map<String, String> variableMap, Map<String, String> parameterMap) {
    
    
        try {
    
    
            // 加载 ktr 文件
            TransMeta transMeta = new TransMeta(ktrPath, (Repository) null);
            Trans trans = new Trans(transMeta);
            trans.setLogLevel(LogLevel.MINIMAL);
            // 变量
            if (Objects.nonNull(variableMap) && !variableMap.isEmpty()) {
    
    
                variableMap.forEach(trans::setVariable);
            }
            // 参数
            if (Objects.nonNull(parameterMap) && !parameterMap.isEmpty()) {
    
    
                parameterMap.forEach((k, v) -> {
    
    
                    try {
    
    
                        trans.setParameterValue(k, v);
                    } catch (UnknownParamException e) {
    
    
                        e.printStackTrace();
                    }
                });
            }
            // 监听执行日志
            KettleLogStore.getAppender().addLoggingEventListener(new KettleLoggingEventListener() {
    
    
                @Override
                public void eventAdded(KettleLoggingEvent logs) {
    
    
                    System.out.println("Kettle 日志:level = " + logs.getLevel() + " , time = " + logs.getTimeStamp() + " , message = " + logs.getMessage());
                }
            });
            // 执行转换
            trans.execute(new String[0]);
            // 等待执行完成
            trans.waitUntilFinished();
            // 是否执行成功
            return trans.getErrors() == 0;
        } catch (Exception e) {
    
    
            e.printStackTrace();
        }
        return false;
    }

}

insert image description here

Go to the output directory to view the results:

insert image description here
insert image description here

2. JAVA calls the Kettle task

The task calls the above conversion and tests:

insert image description here

Save kjbthe file:

insert image description here

JavaCall logic:

public class RunJob {
    
    

    public static void main(String[] args) {
    
    
        try {
    
    
            // 指定插件位置
            StepPluginType.getInstance().getPluginFolders().
                    add(new PluginFolder("D:/data-integration_9_3/plugins/", false, true));
            // 初始化 kettle 环境
            KettleEnvironment.init();
        } catch (KettleException e) {
    
    
            e.printStackTrace();
        }
        String kjbPath = "D:/data/job/job.kjb";
        String url = "https://blog.csdn.net/community/home-api/v1/get-business-list?page=2&size=20&businessType=blog&orderby=&noMore=false&year=&month=&username=qq_43692950";
        // 添加变量
        Map<String, String> variableMap = new HashMap<>();
        variableMap.put("url", url);
        Boolean res = runJob(kjbPath, variableMap, null);
        System.out.println("转换执行结果:" + res);
    }

    private static Boolean runJob(String kjbPath, Map<String, String> variableMap, Map<String, String> parameterMap) {
    
    
        try {
    
    
            JobMeta jobMeta = new JobMeta(kjbPath, null);
            Job job = new Job(null, jobMeta);
            job.setLogLevel(LogLevel.MINIMAL);
            // 变量
            if (Objects.nonNull(variableMap) && !variableMap.isEmpty()) {
    
    
                variableMap.forEach(job::setVariable);
            }
            // 参数
            if (Objects.nonNull(parameterMap) && !parameterMap.isEmpty()) {
    
    
                parameterMap.forEach((k, v) -> {
    
    
                    try {
    
    
                        job.setParameterValue(k, v);
                    } catch (UnknownParamException e) {
    
    
                        e.printStackTrace();
                    }
                });
            }
            // 监听执行日志
            KettleLogStore.getAppender().addLoggingEventListener(new KettleLoggingEventListener() {
    
    
                @Override
                public void eventAdded(KettleLoggingEvent logs) {
    
    
                    System.out.println("Kettle 日志:level = " + logs.getLevel() + " , time = " + logs.getTimeStamp() + " , message = " + logs.getMessage());
                }
            });
            // 执行作业
            job.start();
            // 等待执行完成
            job.waitUntilFinished();
            // 是否执行成功
            return job.getErrors() == 0;
        } catch (Exception e) {
    
    
            e.printStackTrace();
        }
        return false;
    }

}

insert image description here

Go to the output directory to view the results:

insert image description here

Guess you like

Origin blog.csdn.net/qq_43692950/article/details/130471123