Notes on sqoop1.99.7create job

Most of the content of this article is copied from "Predecessor's Blog ", because the interaction of sqoop is too unfriendly, and the problems encountered in the practice of creating jobs are sorted out, so that some students also fall into the pit. The red part is where I am modifying and supplementing (and it is also the place where the cheaters need to emphasize).

The following are the properties

Name: An identifier, you can specify it yourself.

Schema Name: Specify the name of Database or Schema. In MySQL, Schema is similar to Database. The specific difference has not been studied, but the description on the official website is similar when it was created. Here you can specify the database name as db_ez, the database in this example.

Table Name: The database table used in this example is tb_forhadoop, and you specify the exported table yourself. In the case of multiple tables, please check the official documentation by yourself.

SQL Statement: You cannot fill in the sql statement after filling in the schema name and table name. The sql statement must contain the words $ {CONDITIONS}, usually where 1=1 and $ {CONDITIONS}

Partition column: When the sql statement is filled in, it must be filled in to partition the data, which is generally a numeric field that can uniquely identify the record.

Partition column nullable:

Boundary query:

Last value:

Later, you need to configure the values ​​of the data destination:

Null alue: roughly what to cover if there is a null value

File format: Specifies the file format of the data file in HDFS. TEXT_FILE is used here, which is the simplest text file.

Compression codec: It is used to specify what compression algorithm is used to compress the exported data file. I specify NONE. This can also use the custom compression algorithm CUSTOM to implement the corresponding interface in Java.

Custom codec: This is the specified custom compression algorithm. In this example, select NONE, so just press Enter.

Output directory: Specify the path stored in the HDFS file system. You must specify an existing path here, or it exists but the path is empty. It seems that this can be successful.

Append mode: Used to specify whether to append new data to the data file if the export file already exists.

Extractors: It is probably the number of executions of etl, for example, fill in 2, then the data in the output of hdfs will be repeated twice... and so on

Loaders: Determines the number of reducers executed at the end (see the source code MapreduceSubmissionEngine.submit method below)

if(request.getLoaders() != null) {
        job.setNumReduceTasks(request.getLoaders());
      } else {
        job.setNumReduceTasks(0);
      }

      job.setOutputFormatClass(request.getOutputFormatClass());
      job.setOutputKeyClass(request.getOutputKeyClass());
      job.setOutputValueClass(request.getOutputValueClass());

Finally, the element# prompt appears again, which is used to enter the properties of the extra mapper jars, and you can write nothing. Enter directly.

At this point, if successful appears, it proves that it has been successfully created.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326432354&siteId=291194637