Trafodion之parquet_tools基本使用

版权声明:本文为博主原创文章,如需转载,请注明出处。 https://blog.csdn.net/Post_Yuan/article/details/82495260

Trafodion执行一个parquet_tools可执行程序,用于检查parquet文件是否正常。
parquet_tools存储在目录$TRFA_HOME/sql/scripts下,

cd $TRAF_HOME/sql/scripts/
ll parquet_tools 

parquet_tools可执行文件依赖于parquet-tools-${PARQUET_VERSION}.jar,这可以通过查看parquet_tools的内容知道,

${lv_cmd} jar ${TRAF_HOME}/export/lib/parquet-tools-${PARQUET_VERSION}.jar $*

关于parquet_tools的用法,

parquet_tools -h
usage: parquet-tools cat [option...] <input>
where option is one of:
       --debug     Enable debug output
    -h,--help      Show this help string
    -j,--json      Show records in JSON format.
       --no-color  Disable color output even if supported
where <input> is the parquet file to print to stdout

usage: parquet-tools head [option...] <input>
where option is one of:
       --debug          Enable debug output
    -h,--help           Show this help string
    -n,--records <arg>  The number of records to show (default: 5)
       --no-color       Disable color output even if supported
where <input> is the parquet file to print to stdout

usage: parquet-tools schema [option...] <input>
where option is one of:
    -d,--detailed  Show detailed information about the schema.
       --debug     Enable debug output
    -h,--help      Show this help string
       --no-color  Disable color output even if supported
where <input> is the parquet file containing the schema to show

usage: parquet-tools meta [option...] <input>
where option is one of:
       --debug     Enable debug output
    -h,--help      Show this help string
       --no-color  Disable color output even if supported
where <input> is the parquet file to print to stdout

usage: parquet-tools dump [option...] <input>
where option is one of:
    -c,--column <arg>  Dump only the given column, can be specified more than
                       once
    -d,--disable-data  Do not dump column data
       --debug         Enable debug output
    -h,--help          Show this help string
    -m,--disable-meta  Do not dump row group and page metadata
    -n,--disable-crop  Do not crop the output based on console width
       --no-color      Disable color output even if supported
where <input> is the parquet file to print to stdout

usage: parquet-tools merge [option...] <input> [<input> ...] <output>
where option is one of:
       --debug     Enable debug output
    -h,--help      Show this help string
       --no-color  Disable color output even if supported
where <input> is the source parquet files/directory to be merged
   <output> is the destination parquet file

以下是一些基本示例,

//查看parquet文件中字段DEVICE_NUMBER的dump信息
parquet_tools dump -c DEVICE_NUMBER -d /opt/trafodion/bss_userinfo_20180812_0
//查看parquet文件的dump信息
parquet_tools dump  -d /opt/trafodion/bss_userinfo_20180812_0 
//查看parquet文件的前10行内容
parquet_tools head  -n 10 /opt/trafodion/bss_userinfo_20180812_0
//查看parquet文件的meta信息
parquet_tools meta /opt/trafodion/bss_userinfo_20180812_0
//查看parquet文件的schema信息
parquet_tools schema /opt/trafodion/bss_userinfo_20180812_0

猜你喜欢

转载自blog.csdn.net/Post_Yuan/article/details/82495260
今日推荐