Big Data learning paths of good programmers hive storage format

  Big Data learning paths of good programmers storage format hive, hive storage format usually are three: textfile, sequencefile, rcfile, orc , custom set hive.default.fileformat = TextFile; default storage format is: textfile textFile: plain text storage, no compression. Low query efficiency.
1.sequencefile:
binary sequence file storage provided by the hive, natural compression.
sequeceFile and are not allowed to use load rcfile way to load data. You need to use the insert inserted into
default payment compression, segmentation, easy to use, write, and query faster. sequencefile and compression properties can be used with.
Table IF EXISTS SEQ1 Not Create (
ID int,
name String
)
Row DELIMITED Fields terminated by the format '\ T'
Lines terminated by '\ n-'
Stored AS SequenceFile
;
### loading data error mode
load data local inpath '/ home / user 'Table INTO SEQ1;
### loads data correctly
INSERT INTO Table SEQ1
SELECT from user1
;
2.rcfile:
rcfile ranks may be mixed compressed, the columns and rows of data in the vicinity of the same block to try to preserve inside the storage format will increase search efficiency, but slower write data. The binding properties of the compression mode and gzcodeC not very good () = SET mapred.output.compression to true; SET mapred.output.compression.codec = org.apache.hadoop.io.compress.GzipCodec;
### rcfile created table:
Create Table IF Not EXISTS RC1 (
ID int,
name String
)
Row the format DELIMITED Fields terminated by '\ T'
Stored AS rcfile
;
Create Table IF Not EXISTS RC2 (
ID int,
name String
)
Row the format DELIMITED Fields terminated by '\ T'
Stored rcfile AS
;
### loading data error mode
load data local inpath '/ home / user' into table rc1;
data ### is loaded correctly
INTO Table RC2 INSERT
SELECT
from user1
;
3. Storage custom:
Data: seqyd metadata file: aGVsbG8saGl2ZQ == aGVsbG8sd29ybGQ = aGVsbG8saGFkb29w seqyd file content after base64 encoded, decode the data:
## Hello, Hive
## Hello, World
## Hello, Hadoop
Create Table CuS (STR STRING)
Stored AS
inputFormat 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'
outputFormat 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextOutputFormat';
the LOAD the LOCAL INPATH the DATA '/ home / cus' INTO TABLE cus;
usually used with the best efficiency defaultCodec + rcfile

Guess you like

Origin blog.51cto.com/14256902/2424908