Table Greenplum table of external

forward from:

https://www.cnblogs.com/kingle-study/p/10552097.html

First, the external table describes the

  Greenplum data loaded on a clear advantage, is to support concurrent load data, gpfdisk tool is loaded concurrently, the database is the corresponding external table

  The so-called external table is only a table definition in the database, no data, the data are stored in a data file outside of the database. greenplum DML operations may be performed on a normal external table, when reading data, the database load data from the data file. External support in the segment table concurrently tell import data from gpfdist, because it is to import data from the segment, it is very efficient.

  Schematic:

  External table gpfdist need to specify IP and port, but also a detailed address directory, file name support wildcard matching. You can write multiple gpfdist address, but the total can not exceed the total number of segment, otherwise it will error.

  GPDB provides two external table: readable external table for loading data, write external table for data offloading. External table can be based on file, also based on the WEB, the two can achieve readable, writable.

When a query uses a conventional external table, the external table is considered to be reread, because during the query data is static. For WEB external table, the data is not re-read because the data may change during the execution of the query.

    Writable external table to select records from a database table and output to a file, named pipe, or other executable program. For example, data can be unloaded from GPDB sent to an executable program that connect to other databases or data ETL tool and loaded elsewhere. Writable MapReduce parallel computing further external tables can be used to output to GPDB.

    External table write after being defined, the data can be selected from a database table and can be inserted into the external table write. External table can be written only allows INSERT operations - SELECT, UPDATE, DELETE, or TRUNCATE is not allowed. External table write output data to an executable program, the program should be able to accept incoming data stream.

    In the creation of an external table, you can specify the delimiter, ERR table, specifies the allowed number of pieces of data error, and encoding information such as the source file.

Second, external table syntax

Copy the code
CREATE [READABLE] EXTERNAL TABLE table_name    
    ( column_name data_type [, ...] | LIKE other_table )
      LOCATION ('file://seghost[:port]/path/file' [, ...])
        | ('gpfdist://filehost[:port]/file_pattern[#transform]'
        | ('gpfdists://filehost[:port]/file_pattern[#transform]'
            [, ...])
        | ('gphdfs://hdfs_host[:port]/path/file')
      FORMAT 'TEXT'
            [( [HEADER]
               [DELIMITER [AS] 'delimiter' | 'OFF']
               [NULL [AS] 'null string']
               [ESCAPE [AS] 'escape' | 'OFF']
               [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
               [FILL MISSING FIELDS] )]
           | 'CSV'
            [( [HEADER]
               [QUOTE [AS] 'quote']
               [DELIMITER [AS] 'delimiter']
               [NULL [AS] 'null string']
               [FORCE NOT NULL column [, ...]]
               [ESCAPE [AS] 'escape']
               [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
               [FILL MISSING FIELDS] )]
           | 'AVRO'
           | 'PARQUET'
 
           | 'CUSTOM' (Formatter=<formatter specifications>)
     [ ENCODING 'encoding' ]
     [ [LOG ERRORS [INTO error_table]] SEGMENT REJECT LIMIT count
       [ROWS | PERCENT] ]
 
CREATE [READABLE] EXTERNAL WEB TABLE table_name    
   ( column_name data_type [, ...] | LIKE other_table )
      LOCATION ('http://webhost[:port]/path/file' [, ...])
    | EXECUTE 'command' [ON ALL
                          | MASTER
                          | number_of_segments
                          | HOST ['segment_hostname']
                          | SEGMENT segment_id ]
      FORMAT 'TEXT'
            [( [HEADER]
               [DELIMITER [AS] 'delimiter' | 'OFF']
               [NULL [AS] 'null string']
               [ESCAPE [AS] 'escape' | 'OFF']
               [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
               [FILL MISSING FIELDS] )]
           | 'CSV'
            [( [HEADER]
               [QUOTE [AS] 'quote']
               [DELIMITER [AS] 'delimiter']
               [NULL [AS] 'null string']
               [FORCE NOT NULL column [, ...]]
               [ESCAPE [AS] 'escape']
               [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
               [FILL MISSING FIELDS] )]
           | 'CUSTOM' (Formatter=<formatter specifications>)
     [ ENCODING 'encoding' ]
     [ [LOG ERRORS [INTO error_table]] SEGMENT REJECT LIMIT count
       [ROWS | PERCENT] ]
 
CREATE WRITABLE EXTERNAL TABLE table_name
    ( column_name data_type [, ...] | LIKE other_table )
     LOCATION('gpfdist://outputhost[:port]/filename[#transform]'
      | ('gpfdists://outputhost[:port]/file_pattern[#transform]'
          [, ...])
      | ('gphdfs://hdfs_host[:port]/path')
      FORMAT 'TEXT'
               [( [DELIMITER [AS] 'delimiter']
               [NULL [AS] 'null string']
               [ESCAPE [AS] 'escape' | 'OFF'] )]
          | 'CSV'
               [([QUOTE [AS] 'quote']
               [DELIMITER [AS] 'delimiter']
               [NULL [AS] 'null string']
               [FORCE QUOTE column [, ...]] ]
               [ESCAPE [AS] 'escape'] )]
           | 'AVRO'
           | 'PARQUET'
 
           | 'CUSTOM' (Formatter=<formatter specifications>)
    [ ENCODING 'write_encoding' ]
    [ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ]
 
CREATE WRITABLE EXTERNAL WEB TABLE table_name
    ( column_name data_type [, ...] | LIKE other_table )
    EXECUTE 'command' [ON ALL]
    FORMAT 'TEXT'
               [( [DELIMITER [AS] 'delimiter']
               [NULL [AS] 'null string']
               [ESCAPE [AS] 'escape' | 'OFF'] )]
          | 'CSV'
               [([QUOTE [AS] 'quote']
               [DELIMITER [AS] 'delimiter']
               [NULL [AS] 'null string']
               [FORCE QUOTE column [, ...]] ]
               [ESCAPE [AS] 'escape'] )]
           | 'CUSTOM' (Formatter=<formatter specifications>)
    [ ENCODING 'write_encoding' ]
    [ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ]
Copy the code

Third, create an external table

  01, syntax

gpfdist [-d directory] [-p http_port] [-l log_file] [-t timeout]
[-S] [-w time] [-v | -V] [-s] [-m max_length] [--ssl certificate_path]
gpfdist -? | --help
gpfdist --version

  02, the boot process

Copy the code
--创建gpdist进程
[gpadmin@greenplum02 ~]$ mkdir script
[gpadmin@greenplum02 ~]$ nohup gpfdist -d /home/gpadmin/script/ -p 8081 -l /home/gpadmin/script/gpfdist.log &
[1] 6904
[gpadmin@greenplum02 ~]$ nohup: ignoring input and appending output to ‘nohup.out’
[gpadmin@greenplum02 ~]$ ss -lntup|grep 8081
tcp    LISTEN     0      128      :::8081                 :::*                   users:(("gpfdist",pid=6904,fd=6))
Copy the code
Copy the code
--- read configuration file 
[gpadmin @ greenplum02 Script] $ CAT test.txt 
Prague, Jan, 101,4875.33 
Rome, Mar, 87,1557.39 
Bangalore, May, 317,8936.99 
Beijing, Jul, 411,11600.67 
San Francisco, Sept , 156,6846.34 
Paris,-Nov, 159,7134.56 
San Francisco, Jan, 113,5397.89 
Prague, On Dec, 333,9894.77 
Bangalore, Jul-, 271,8320.55 
Beijing, On Dec, 100,4248.41 

Q 
[gpadmin greenplum02 @ Script] $ pwd 
/ Home / gpadmin / Script 
- is behind the error message
Copy the code

  03, create the external table

Copy the code
External Table public.test Create 
( 
Country VARCHAR (128), 
name VARCHAR (128), 
ID int, 
Sale VARCHAR (128) 
) 
LOCATION ( 'gpfdist: //192.168.0.222: 8081 / test.txt') 
the format 'text' 
(DELIMITER ',' null AS '' Escape 'OFF') 
encoding 'UTF8' 
log segment Reject error limit. 3 rows; 

location lOCATION --- file is located, a local direct path, gpfdist address, gpfdists address, gphdfs address. 
The format type of text --- 
--- DELIMITER separator 
--- encoding encoded 
--- log error into the error data table, erroneous data record, is automatically created. Usually tablename_err format, such as t1_err. 
--- segment reject limit Article Number / percentage (rows / percent), exceeds the set value of the erroneous data error. The minimum value is 2. Used to ensure data integrity.
Copy the code
Copy the code
结果:
postgres=# create external table public.test99(country varchar(128),name varchar(128),id int,sale varchar(128))location ('gpfdist://192.168.0.222:8081/test.txt')format 'text'(delimiter ',' null as '' escape 'off')encoding 'utf8'log errors segment reject limit 3 rows; CREATE EXTERNAL TABLE postgres=# SELECT * from public.test99 postgres-# ; NOTICE: Found 2 data formatting errors (2 or more input rows). Rejected related input data. country | name | id | sale ---------------+------+-----+---------- Prague | Jan | 101 | 4875.33 Rome | Mar | 87 | 1557.39 Bangalore | May | 317 | 8936.99 Beijing | Jul | 411 | 11600.67 San Francisco | Sept | 156 | 6846.34 Paris | Nov | 159 | 7134.56 San Francisco | Jan | 113 | 5397.89 Prague | Dec | 333 | 9894.77 Bangalore | Jul | 271 | 8320.55 Beijing | Dec | 100 | 4248.41 (10 rows) postgres=# SELECT * from test99; NOTICE: Found 2 data formatting errors (2 or more input rows). Rejected related input data. country | name | id | sale ---------------+------+-----+---------- Prague | Jan | 101 | 4875.33 Rome | Mar | 87 | 1557.39 Bangalore | May | 317 | 8936.99 Beijing | Jul | 411 | 11600.67 San Francisco | Sept | 156 | 6846.34 Paris | Nov | 159 | 7134.56 San Francisco | Jan | 113 | 5397.89 Prague | Dec | 333 | 9894.77 Bangalore | Jul | 271 | 8320.55 Beijing | Dec | 100 | 4248.41 (10 rows)
Copy the code

  04, data load

insert into table select * from table_ext; 

inner table <---- external table



Guess you like

Origin www.cnblogs.com/xibuhaohao/p/11127735.html