Oracle data processing sql*loader (1)

SQL*Loader is a tool provided by oracle that can load data into the database from various flat files. It is more suitable for business analysis type databases (data warehouses); using the sqlldr tool can load a large amount of data into the database in a short time. , like importing the prepared excel table into the database, it can be said to be very convenient, related data loading and unloading tools and external tables, IMP/EXP, data pump, etc. In fact, most of the time to learn about SQL*Loader is spent in Thinking about the writing method of the sqlldr control file, let's summarize the SQL*Loader learning process and some experimental cases.

First, the command help information of sqlldr

[oracle@wjq ~]$ sqlldr

SQL*Loader: Release 11.2.0.4.0 - Production on Tue Oct 31 11:46:27 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Usage: SQLLDR keyword=value [,keyword=value,...]

 

Valid Keywords:

 

    userid -- ORACLE username/password            #Oracle username and password

   control -- control file name #Control                   file name

       log -- log file name                       #log file name

       bad -- bad file name #bad                       file name

      data -- data file name #Data                      file name

   discard -- discard file name #discard                   file name

   discardmax -- number of discards to allow (Default all) #allow all         the number of discards (default all)

   skip -- number of logical records to skip(Default 0) #Number of logical records to skip      ( default 0)

   load -- number of logical records to load(Default all) #Number of logical records to load    ( default all)

   errors -- number of errors to allow (Default 50) #The number of errors            allowed (default 50)

   rows -- number of rows in conventional path bind array or between direct path data saves

   (Default: Conventional path 64, Direct path all) #The number of lines in the conventional path binding array or between the data stored in the direct path ( default 64 for the conventional path, and all for the direct path by default) 

   bindsize -- size of conventional path bind array in bytes (Default 256000) #The size of conventional path binding data (default 256000 bytes)

   silent -- suppress messages during run (header,feedback,errors,discards,partitions) #Hidden messages during run ( title, feedback, error, discard, partition)

   direct -- use direct path (Default FALSE) #Use direct path (default FALSE)

   parfile -- parameter file: name of file that contains parameter specifications #parameter file , including the name of the parameter specification file

   parallel -- do parallel load (Default FALSE) #Execute parallel load (default FALSE)

   file -- file to allocate extents from #file to allocate       extents from

   skip_unusable_indexes -- disallow/allow unusable indexes or index partitions (Default FALSE) #Do not allow/allow useless indexes (default FALSE)

   skip_index_maintenance -- do not maintain indexes, mark affected indexes as unusable (Default FALSE) #Do not maintain indexes, mark affected indexes as unusable ( default FALSE)

   commit_discontinued -- commit loaded rows when load is discontinued (Default FALSE) #Commit loaded rows when load is discontinued ( default FALSE)

   readsize -- size of read buffer (Default 1048576) #The size of the read buffer (default 1048576 bytes)

   external_table -- use external table for load; NOT_USED, GENERATE_ONLY, EXECUTE (Default NOT_USED ) #Use external table for load: NOT_USED, GENERATE_ONLY, EXECUTE (default NOT_USED)

   columnarrayrows -- number of rows for direct path column array (Default 5000) #Number of rows for direct path column array ( default 5000)

   streamsize -- size of direct path stream buffer in bytes (Default 256000) #The size of the direct path stream buffer (default 256000)

   multithreading -- use multithreading in direct path #Use multithreading in   direct path

   resumable -- enable or disable resumable for current session (Default FALSE) #enable or disable the current resumable session (default FALSE)

   resumable_name -- text string to help identify resumable statement

   resumable_timeout -- wait time (in seconds) for RESUMABLE (Default 7200) #RESUMABLE wait time (default 7200 seconds)

date_cache    -- size (in entries) of date conversion cache (Default 1000) #date conversion cache size (in entries) (default 1000)

   no_index_errors -- abort load on any index errors  (Default FALSE)

 

   PLEASE NOTE: Command-line parameters may be specified either by

   position or by keywords.  An example of the former case is 'sqlldr

   scott/tiger foo'; an example of the latter is 'sqlldr control=foo

   userid=scott/tiger'.  One may specify parameters by position before

   but not after parameters specified by keywords.  For example,

   'sqlldr scott/tiger control=foo logfile=log' is allowed, but

   'sqlldr scott/tiger control=foo log' is not, even though the

   position of the parameter 'log' is correct. 

Notice:

The parameter combination of SQLLDR is more flexible, that is, you can write the value directly, or you can write the keyword=value. For example: sqlldr scott/tiger foo and sqlldr control=foo userid=scott/tiger are both valid.   


2. Use cases

1.1 Simple example

Create a new control file wjq_test1.ctl, the name and file type of the control file can be arbitrarily specified, and then write the content in the control file

SCOTT@seiang11g>create table tb_loader as select * from bonus;

Table created.

 SCOTT@seiang11g>desc tb_loader

 Name                             Null?    Type

 -------------------------------- -------- ------------------------------------

 ENAME                                     VARCHAR2(10)

 JOB                                       VARCHAR2(9)

 SAL                                       NUMBER

 COMM                                      NUMBER

 The content of the control file is as follows:

[oracle@wjq SQL*Loader]$vim wjq_test1.ctlLOAD DATA INFILE * INTO TABLE tb_loader FIELDS TERMINATED BY "," (ENAME,JOB,SAL) BEGINDATA SMITH,CLEAK,3904 ALLEN,SALESMAN,2891   WARD,SALESMAN,3128 KING,PRESIDENT,2523 









 

 Execute the sqlldr command

[oracle@wjq SQL*Loader]$ sqlldr scott/tiger control=/u01/app/oracle/SQL*Loader/wjq_test1.ctl

SQL*Loader: Release 11.2.0.4.0 - Production on Tue Oct 31 14:43:12 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Commit point reached - logical record count 4

 

It can be found that 4 pieces of data have been generated, and then connect to the database to view the content

SCOTT@seiang11g>select * from tb_loader;

ENAME      JOB              SAL       COMM
---------- --------- ---------- ----------
SMITH      CLEAK           3904
ALLEN      SALESMAN        2891
WARD       SALESMAN        3128
KING       PRESIDENT       2523

 

It is found that the queried content is the data in the BEGINDATA in the control file, and the data has been successfully loaded.

Tip: The table to be inserted must already exist in the database, and then use sqlldr to load data into it

 

2. System analysis of SQL*Loader

LOAD DATA

------------------------------------

INFILE *

INTO TABLE tb_loader

FIELDS TERMINATED BY ","

(ENAME,JOB,SAL)

START DATES

------------------------------------

SMITH,CLEAK,3904

ALLEN,SALESMAN,2891 

WARD,SALESMAN,3128

KING,PRESIDENT,2523

 

 

1.2 Control file parsing

 

①Part 1:

LOAD DATA is a standard syntax, and control files generally start with this. Before LOAD DATA, you can also specify UNRECOVERABLE or RECOVERABLE to control whether the loaded data can be recovered, or specify CONTINUE_LOAD to continue loading. Other statements in the control file can be viewed in the official website. documentation.

 

②Middle part:
*INFILE: Indicates the location of the data file. If the value is *, it means that the data is in the control file. In this example, there is no separate data file. For most loads, the data file and the control file will be separated.
*INTO TABLE tbl_name: tbl_name is the target table to which the data is to be loaded. This table must have been created before you execute the sqlldr command.
Before *INTO, there are some interesting parameters that need to be explained:
  *INSERT: insert data into the table, the table must be empty, if the table is not empty, an error will be reported when executing the sqlldr command, the default is the INSERT parameter.
  *APPEND: Append data to the table, regardless of whether there is data in the table.
  *REPLACE: Replace the data in the table, which is equivalent to DELETE all the data in the table first, and then INSERT.
  *TRUNCATE: Similar to REPLACE, except that the data in the table is not deleted by DELETE, but by TRUNCATE, and then INSERT.
*FIELDS TERMINATED BY ",": Set the separation value of the data part string, here is set to comma (,) separation, of course, it can be replaced with any other visible character, as long as it is the separator in the data row.
*(ENAME,JOB,SAL): The column name of the table to be inserted. It should be noted here that the column name should be exactly the same as the column name in the table. The order of the columns can be different from the column order in the table, but it must be the same as the data part. One -to-one correspondence between columns?
*BEGINDATA: Indicates that the following is the data to be loaded, which is valid only when INFILE is specified as *.

③Data part
In this case, both the data part and the control part are placed in the control file, usually this part exists independently in a text file. If it is an independent data file, just change the * after the INFILE parameter in the control file to the file name of the data file.

 

1.3 Log file parsing

By default, during the execution of the sqlldr command, a log file with the same name as the control file will be automatically generated. The file extension is .log. The log file records various statistical information during the loading process, such as some initialization parameters, Number of records read, number of records successfully loaded, loading time, etc.
In the previous example, after executing the sqlldr command, a ldr_case1.log file should be generated in the same path. Open it directly with the "Notepad" tool, and the following content should be displayed:

[oracle@wjq SQL*Loader]$cat wjq_test1.log SQL*Loader: Release 11.2.0.4.0 - Production on Tue Oct 31 14:43:12 2017 Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved. Control File:   /u01/app/oracle/SQL*Loader/wjq_test1.ctl Data File:      /u01/app/oracle/SQL*Loader/wjq_test1.ctl   Bad File:     /u01/app/oracle/SQL*Loader/wjq_test1.bad   Discard File:  none specified  (Allow all discards) Number to load: ALL Number to skip: 0 Errors allowed: 50 Bind array:     64 rows, maximum of 256000 bytes Continuation:    none specified Path used:      Conventional Table TB_LOADER, loaded from every logical record.  









 










Insert option in effect for this table: INSERT

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
ENAME                               FIRST     *   ,       CHARACTER            
JOB                                  NEXT     *   ,       CHARACTER            
SAL                                  NEXT     *   ,       CHARACTER            


Table TB_LOADER:
  4 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Space allocated for bind array:                  49536 bytes(64 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:             4
Total logical records rejected:         0
Total logical records discarded:        0

Run began on Tue Oct 31 14:43:12 2017
Run ended on Tue Oct 31 14:43:12 2017

Elapsed time was:     00:00:00.03
CPU time was:         00:00:00.01

 

The structure of the log file is simple, the front is the initialization parameters, and the middle and second half are what we should pay attention to, including the structure of the record, the number of records of the operation (including successful and wrong), the time spent, etc., as in this The bolded part of the log file shows that 4 records have been successfully loaded, which took nearly 40 milliseconds.


1.4 Error file parsing

During the execution of the sqlldr command, not only a log file will be generated, but if a loading error occurs due to the data not conforming to the specification during the data loading process, an error file with the same name will also be generated, with the file extension bad (if the DBA does not specify it explicitly) . The erroneous data is recorded in this file. The format of the data in the error file is exactly the same as that of the data file. Therefore, if an error file is found during loading, analyze the cause of the error according to the log file, modify the infile parameter in the control file to be the error file, and then re-execute the sqlldr command.


1.5 Abandoned file analysis

In addition to log files and error files, an abandoned file with the same name may be generated when the sqlldr command is executed. The file extension is .dsc, which will not exist by default. You must explicitly specify the abandoned file when executing the sqlldr command, and indeed There are records that do not conform to the import logic, which record data that has not been inserted




Author : SEian.G

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325143144&siteId=291194637