Pits encountered in the use of ETL

1. The problem of Chinese garbled characters in ETL:

1. Solution Open the database connection

Add the third line of parameter configuration in the command parameter of the option: characterEncoding value is utf-8

Note: Any Chinese that appears in ETL, including the header in the stream, that is, the column name is Chinese, will be garbled when it is saved to the resource library. No good solution has been found for the time being, and Chinese can only be used in the ETL program.

2. Precautions for the use of ETL's insert update component

The logic of inserting the update component is to update if the conditions of the above table are met, and insert if not. The following writing is wrong, we should remove the judgment condition of STATISTICS_ITEM_VALUE<>STATISTICS_ITEM_VALUE, otherwise the value will be inserted if there is no change operate

3. TINYINT type data, IS_ENABLE, can be correctly judged by IS_ENABLE=1 in the SQL statement, but the 1 in the query becomes Y, and the 0 becomes N. There is no good solution at present, it can only be used after the query is made. When making judgments.

4. When using the sort component, you need to pay attention to the deadlock problem. The kettle defaults to processing 1000 records at a time. The sort component will not be executed until all the data comes in. If there are other components in parallel with the sort component, other The component of sorting has finished executing the data first, and waiting for the sorting component to process the data merging will cause a deadlock. For example, there are a total of 5000 records, the 1000th record successfully reaches the sort component and the parallel component, because the parallel component has to wait for the data processed by the sort component to be merged. 1000 records have been stored, and the next 1000 records can successfully reach the sort components, but can no longer reach the side-by-side components, resulting in a deadlock. Reference:  http://wiki.pentaho.com/display/EAI/Transformation+Deadlocks

5. If there is a loop operation in the job, the java control cannot be used in the conversion inside the loop. When using the java control, you can execute the first time of the loop, and when the loop is executed for the second time, an error message of failure to initialize java will be reported, as follows:

- Java 代码.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Error initializing UserDefinedJavaClass:
2017/04/21 16:36:14 - Java 代码.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : org.pentaho.di.core.exception.KettleException: 
2017/04/21 16:36:14 - Java 代码.0 - null
2017/04/21 16:36:14 - Java 代码.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : 错误初始化步骤[Java 代码]
2017/04/21 16:36:14 - ktr_set_search_candition - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : 步骤 [Java 代码.0] 初始化失败!
2017/04/21 16:36:14 - ktr_set_search_candition - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Unable to prepare for execution of the transformation
2017/04/21 16:36:14 - ktr_set_search_candition - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : org.pentaho.di.core.exception.KettleException: 
2017/04/21 16:36:14 - ktr_set_search_candition - 无法初始化至少一个步骤. 执行无法开始!

6. The JSON object cannot be used in the js control to process the operation of converting strings to json, as follows.

JSONArray obj = new JSONArray().fromObject(string);

This way there is no problem when editing etl and executing it by using Kettle. But when it is called through a program, it does not work. The program error message is that it doesn't know JSON. Find a solution in this case: use eval("("+ETL_VARIABLES+")") to convert strings into json format. If you use this method, you must ensure that the input string must be in json format in java. Otherwise, there will be problems after conversion. Reference: http://www.cnblogs.com/Liujunyan/p/4965924.html

7. When executing ETL, there will be parameters passed, but parameters can be set in Kettle.properties, parameters can also be set in environment variables, and parameters can also be entered when the program is called. These priorities are as follows: Kettle. properties > environment variable parameters > program input parameters. Therefore, when the program is called and executed, it is necessary to avoid the interference of the other two methods on the input parameters.

8. The problem of setting global variables: When some global variables need to be generated in some conversions, the variables generated by the previous operations must be extracted as global ones in the "Set Environment Variables" control. Especially when executing the loop, it is necessary to set whether to replace the value. Otherwise, the loop variable will not increase.

9. Cache problem.

    After the etl is executed through Kettle, there will be a cache. Need to clear the cache, and then use the program call. The parameters set in Kettle.properties will be affected when the program is called.

10. In a conversion, if there are multiple paths, but an error occurs in the execution of one path, it will affect the execution of other branches.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324454204&siteId=291194637