forward from:
https://www.cnblogs.com/kingle-study/p/10552097.html
First, the external table describes the
Greenplum data loaded on a clear advantage, is to support concurrent load data, gpfdisk tool is loaded concurrently, the database is the corresponding external table
The so-called external table is only a table definition in the database, no data, the data are stored in a data file outside of the database. greenplum DML operations may be performed on a normal external table, when reading data, the database load data from the data file. External support in the segment table concurrently tell import data from gpfdist, because it is to import data from the segment, it is very efficient.
Schematic:
External table gpfdist need to specify IP and port, but also a detailed address directory, file name support wildcard matching. You can write multiple gpfdist address, but the total can not exceed the total number of segment, otherwise it will error.
GPDB provides two external table: readable external table for loading data, write external table for data offloading. External table can be based on file, also based on the WEB, the two can achieve readable, writable.
When a query uses a conventional external table, the external table is considered to be reread, because during the query data is static. For WEB external table, the data is not re-read because the data may change during the execution of the query.
Writable external table to select records from a database table and output to a file, named pipe, or other executable program. For example, data can be unloaded from GPDB sent to an executable program that connect to other databases or data ETL tool and loaded elsewhere. Writable MapReduce parallel computing further external tables can be used to output to GPDB.
External table write after being defined, the data can be selected from a database table and can be inserted into the external table write. External table can be written only allows INSERT operations - SELECT, UPDATE, DELETE, or TRUNCATE is not allowed. External table write output data to an executable program, the program should be able to accept incoming data stream.
In the creation of an external table, you can specify the delimiter, ERR table, specifies the allowed number of pieces of data error, and encoding information such as the source file.
Second, external table syntax
CREATE [READABLE] EXTERNAL TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) LOCATION ('file://seghost[:port]/path/file' [, ...]) | ('gpfdist://filehost[:port]/file_pattern[#transform]' | ('gpfdists://filehost[:port]/file_pattern[#transform]' [, ...]) | ('gphdfs://hdfs_host[:port]/path/file') FORMAT 'TEXT' [( [HEADER] [DELIMITER [AS] 'delimiter' | 'OFF'] [NULL [AS] 'null string'] [ESCAPE [AS] 'escape' | 'OFF'] [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF'] [FILL MISSING FIELDS] )] | 'CSV' [( [HEADER] [QUOTE [AS] 'quote'] [DELIMITER [AS] 'delimiter'] [NULL [AS] 'null string'] [FORCE NOT NULL column [, ...]] [ESCAPE [AS] 'escape'] [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF'] [FILL MISSING FIELDS] )] | 'AVRO' | 'PARQUET' | 'CUSTOM' (Formatter=<formatter specifications>) [ ENCODING 'encoding' ] [ [LOG ERRORS [INTO error_table]] SEGMENT REJECT LIMIT count [ROWS | PERCENT] ] CREATE [READABLE] EXTERNAL WEB TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) LOCATION ('http://webhost[:port]/path/file' [, ...]) | EXECUTE 'command' [ON ALL | MASTER | number_of_segments | HOST ['segment_hostname'] | SEGMENT segment_id ] FORMAT 'TEXT' [( [HEADER] [DELIMITER [AS] 'delimiter' | 'OFF'] [NULL [AS] 'null string'] [ESCAPE [AS] 'escape' | 'OFF'] [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF'] [FILL MISSING FIELDS] )] | 'CSV' [( [HEADER] [QUOTE [AS] 'quote'] [DELIMITER [AS] 'delimiter'] [NULL [AS] 'null string'] [FORCE NOT NULL column [, ...]] [ESCAPE [AS] 'escape'] [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF'] [FILL MISSING FIELDS] )] | 'CUSTOM' (Formatter=<formatter specifications>) [ ENCODING 'encoding' ] [ [LOG ERRORS [INTO error_table]] SEGMENT REJECT LIMIT count [ROWS | PERCENT] ] CREATE WRITABLE EXTERNAL TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) LOCATION('gpfdist://outputhost[:port]/filename[#transform]' | ('gpfdists://outputhost[:port]/file_pattern[#transform]' [, ...]) | ('gphdfs://hdfs_host[:port]/path') FORMAT 'TEXT' [( [DELIMITER [AS] 'delimiter'] [NULL [AS] 'null string'] [ESCAPE [AS] 'escape' | 'OFF'] )] | 'CSV' [([QUOTE [AS] 'quote'] [DELIMITER [AS] 'delimiter'] [NULL [AS] 'null string'] [FORCE QUOTE column [, ...]] ] [ESCAPE [AS] 'escape'] )] | 'AVRO' | 'PARQUET' | 'CUSTOM' (Formatter=<formatter specifications>) [ ENCODING 'write_encoding' ] [ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ] CREATE WRITABLE EXTERNAL WEB TABLE table_name ( column_name data_type [, ...] | LIKE other_table ) EXECUTE 'command' [ON ALL] FORMAT 'TEXT' [( [DELIMITER [AS] 'delimiter'] [NULL [AS] 'null string'] [ESCAPE [AS] 'escape' | 'OFF'] )] | 'CSV' [([QUOTE [AS] 'quote'] [DELIMITER [AS] 'delimiter'] [NULL [AS] 'null string'] [FORCE QUOTE column [, ...]] ] [ESCAPE [AS] 'escape'] )] | 'CUSTOM' (Formatter=<formatter specifications>) [ ENCODING 'write_encoding' ] [ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ]
Third, create an external table
01, syntax
gpfdist [-d directory] [-p http_port] [-l log_file] [-t timeout] [-S] [-w time] [-v | -V] [-s] [-m max_length] [--ssl certificate_path] gpfdist -? | --help gpfdist --version
02, the boot process
--创建gpdist进程 [gpadmin@greenplum02 ~]$ mkdir script [gpadmin@greenplum02 ~]$ nohup gpfdist -d /home/gpadmin/script/ -p 8081 -l /home/gpadmin/script/gpfdist.log & [1] 6904 [gpadmin@greenplum02 ~]$ nohup: ignoring input and appending output to ‘nohup.out’ [gpadmin@greenplum02 ~]$ ss -lntup|grep 8081 tcp LISTEN 0 128 :::8081 :::* users:(("gpfdist",pid=6904,fd=6))
--- read configuration file [gpadmin @ greenplum02 Script] $ CAT test.txt Prague, Jan, 101,4875.33 Rome, Mar, 87,1557.39 Bangalore, May, 317,8936.99 Beijing, Jul, 411,11600.67 San Francisco, Sept , 156,6846.34 Paris,-Nov, 159,7134.56 San Francisco, Jan, 113,5397.89 Prague, On Dec, 333,9894.77 Bangalore, Jul-, 271,8320.55 Beijing, On Dec, 100,4248.41 Q [gpadmin greenplum02 @ Script] $ pwd / Home / gpadmin / Script - is behind the error message
03, create the external table
External Table public.test Create ( Country VARCHAR (128), name VARCHAR (128), ID int, Sale VARCHAR (128) ) LOCATION ( 'gpfdist: //192.168.0.222: 8081 / test.txt') the format 'text' (DELIMITER ',' null AS '' Escape 'OFF') encoding 'UTF8' log segment Reject error limit. 3 rows; location lOCATION --- file is located, a local direct path, gpfdist address, gpfdists address, gphdfs address. The format type of text --- --- DELIMITER separator --- encoding encoded --- log error into the error data table, erroneous data record, is automatically created. Usually tablename_err format, such as t1_err. --- segment reject limit Article Number / percentage (rows / percent), exceeds the set value of the erroneous data error. The minimum value is 2. Used to ensure data integrity.
结果:
postgres=# create external table public.test99(country varchar(128),name varchar(128),id int,sale varchar(128))location ('gpfdist://192.168.0.222:8081/test.txt')format 'text'(delimiter ',' null as '' escape 'off')encoding 'utf8'log errors segment reject limit 3 rows; CREATE EXTERNAL TABLE postgres=# SELECT * from public.test99 postgres-# ; NOTICE: Found 2 data formatting errors (2 or more input rows). Rejected related input data. country | name | id | sale ---------------+------+-----+---------- Prague | Jan | 101 | 4875.33 Rome | Mar | 87 | 1557.39 Bangalore | May | 317 | 8936.99 Beijing | Jul | 411 | 11600.67 San Francisco | Sept | 156 | 6846.34 Paris | Nov | 159 | 7134.56 San Francisco | Jan | 113 | 5397.89 Prague | Dec | 333 | 9894.77 Bangalore | Jul | 271 | 8320.55 Beijing | Dec | 100 | 4248.41 (10 rows) postgres=# SELECT * from test99; NOTICE: Found 2 data formatting errors (2 or more input rows). Rejected related input data. country | name | id | sale ---------------+------+-----+---------- Prague | Jan | 101 | 4875.33 Rome | Mar | 87 | 1557.39 Bangalore | May | 317 | 8936.99 Beijing | Jul | 411 | 11600.67 San Francisco | Sept | 156 | 6846.34 Paris | Nov | 159 | 7134.56 San Francisco | Jan | 113 | 5397.89 Prague | Dec | 333 | 9894.77 Bangalore | Jul | 271 | 8320.55 Beijing | Dec | 100 | 4248.41 (10 rows)
04, data load
insert into table select * from table_ext; inner table <---- external table