Parameters and settings of the export data format of beeline connected to hive


beeline data export parameter format

  • reference statement
beeline -u jdbc:hive2://host:10000 --incremental=true --showHeader=false --outputformat=dsv --delimiterForDSV=$'\t' -e 'select * from test' > ./file.txt
  • Parameter Description
parameter illustrate
–incremental=[true/false] The default is true from Hive 2.3 and later, and the default is false before it. When set to false, the full result set is collected and cached before displaying for optimal display column width. When set to true, the result set will be displayed as soon as it is fetched, consuming less latency and memory for populating the displayed columns. When you encounter a memory overflow on the client side, it is recommended to set --incremental=true (because the fetched result set is very large).
–showHeader=[true/false] Shows whether the column name is in the query result. The default is true. Use case: beeline --showHeader=false
–outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2] The mode in which the results are displayed. The default is table. dsv can be used together with delimiterForDSV to set the delimiter
–delimiterForDSV= DELIMITER The delimiter used to delimit values ​​in the output format. The default is '|', if you need to enter special symbols, such as '\n001', '\x01', etc., you need to add $ in front, for example --delimiterForDSV=$'\t'
-e export sql
  • Result display format (set in the outputformat parameter)

The display form is mainly to separate the fields of a row of values ​​according to different separators, mainly including five segmentation output formats: csv, tsv, csv2, tsv2, dsv. Currently, csv and tsv have been replaced by csv2 and tsv2.
The meanings of dsv, csv2 and tsv2 formats are respectively:
csv2 uses commas,
tsv2 uses tab spaces, and
dsv is configurable. For dsv format, the delimiter can be set by parameter delimiterForDSV, the default is '|'.


for example

  • Method 1: The values ​​of the showHeader, outputformat and delimiterForDSV parameters are not set, indicating that the default output format of Beeline is used. From the outputformat description, we know that the default output format of Beeline is table mode, so the next step is to figure out the table mode. Whether the field delimiter is "|" and the column name is in the query result. --showHeader=default is true.

sentence

beeline -e "select * from table_test > ./test.txt"

The display results are as follows

+------------+--------------+------------------+----------+-------------+----------------+--------------+
| area_code  |   acc_nbr    |     serv_id      | prod_id  |  time_tag   |   model_type   | rule_result  |
+------------+--------------+------------------+----------+-------------+----------------+--------------+
| 777        | 18028137123  | 120000485555514  | 3102     | 2023010914  |     507		 | 10001        |
+------------+--------------+------------------+----------+-------------+----------------+--------------+
  • Method 2: Remove the column name, –showHeader=false

sentence

beeline --showHeader=false -e "select * from table_test > ./test.txt"

The display results are as follows

| 777        | 18028137123  | 120000485555514  | 3102     | 2023010914  |     507		 | 10001        |
  • Method 3: If the values ​​are separated by commas, you can add –outputformat=csv2 to the beeline execution statement; if the values ​​are separated by tabs, you can add –outputformat=tsv2 to the beeline execution statement. If the delimiter of the method does not meet the requirements and you want to use other delimiters to divide the execution result value of beeline, you can add –outputformat=dsv2 and –delimiterForDSV=other delimiter symbols in the beeline execution statement.

1) separated by commas

sentence

beeline --outputformat=csv2 -e "$tj_sql"  > ./test_3.txt

The display results are as follows

area_code,acc_nbr,serv_id,prod_id,time_tag,model_type,rule_result
777,18028137123,120000485555514,3102,2023010914,507,10001

2) Use tab as a separator between values

sentence

beeline --outputformat=tsv2 -e "$tj_sql"  > ./test_3.txt

The display results are as follows

area_code       acc_nbr serv_id prod_id time_tag        model_type      rule_result
777     18028137123     120000485555514 3102    2023010914      507 10001

3) For other separators, set the output format of beeline (–outputformat=dsv) to dsv, and use the default separator of dsv, namely '|'.

sentence

beeline --outputformat=dsv -e "$tj_sql"  > ./test_3.txt

The display results are as follows

area_code|acc_nbr|serv_id|prod_id|time_tag|model_type|rule_result
777|18028137123|120000485555514|3102|2023010914|507|10001

4) For other delimiters, set the output format of beeline to dsv (–outputformat=dsv), and use “\t” as the delimiter between values ​​(–delimiterForDSV=$'\t').

sentence

beeline --outputformat=dsv --delimiterForDSV=$'\t' -e "$tj_sql"  > ./test_3.txt

The display results are as follows

area_code       acc_nbr serv_id prod_id time_tag        model_type      rule_result
777     18028137123     120000485555514 3102    2023010914      507 10001

5) For other delimiters, set the output format of beeline to dsv (–outputformat=dsv), and use “#” as the delimiter between values ​​(–delimiterForDSV=$'#').

sentence

beeline --outputformat=dsv --delimiterForDSV=$'#' -e "$tj_sql"  > ./test_3.txt

The display results are as follows

area_code#acc_nbr#serv_id#prod_id#time_tag#model_type#rule_result
777#18028137123#120000485555514#3102#2023010914#507#10001

Guess you like

Origin blog.csdn.net/sodaloveer/article/details/128617521