Thrift 0.12.0安装和 Parquet-mr 编译

新的公司的parquet-mr 版本相对于原来老的CDH版本升级了不少,为了方便看parquet相关代码,所以需要本地编译 parquet-mr 代码

Thrift 0.12.0 版本安装

thrift 版本踩坑

之前CDH版本的代码,是thrift 0.9 通吃类型,brew直接搞定,但是新公司的Thrift版本已经升级,所以想在本地安装一下对应的thrift版本。

因为thrift版是0.12版本,低版本 0.9 版本太低, thrift 对应版本 0.14.0 有些类已经发生更新,所以也编译不了。

[1] $ brew search thrift 
==> Formulae
thrift ✔ 
[email protected]

所以接下来是漫长的 Thrift 0.12.0 版本的安装过程

编译环境准备

升级GCC版本

在编译源码的时候,电脑自带的gcc版本是 4.2.1 版本太低了,顺便升级了一下gcc版本

brew install gcc@8
alias gcc='gcc-8'
alias cc='gcc-8'
alias g++='g++-8'
alias c++='c++-8'

Install Boost

boost是一个C++ Library,最好是手动安装.C++ 项目的官网看着不是太适应,直接给下载地址

https://www.boost.org/doc/libs/1_75_0/more/getting_started/unix-variants.html#easy-build-and-install

下载后,执行

./bootstrap.sh
sudo ./b2 threading=multi address-model=64 variant=release stage install

Install libevent

安装libvent brew install libevent

编译安装 thrift 0.12.0

在准备好上述环境后,重新尝试安装thrift

wget https://mirrors.tuna.tsinghua.edu.cn/apache/thrift/0.12.0/thrift-0.12.0.tar.gz
./configure --without-ruby PY_PREFIX=/Users/wakun/opt/anaconda3/lib/python3.8
make

TroubleShooting

error: no member named ‘stdcxx’ in namespace ‘apache::thrift’

src/thrift/async/TAsyncProtocolProcessor.cpp:29:55: error: no member named 'stdcxx' in namespace 'apache::thrift'
void TAsyncProtocolProcessor::process(apache::thrift::stdcxx::function<void(bool healthy)> _return,
                                      ~~~~~~~~~~~~~~~~^
src/thrift/async/TAsyncProtocolProcessor.cpp:29:31: error: variable has incomplete type 'void'
void TAsyncProtocolProcessor::process(apache::thrift::stdcxx::function<void(bool healthy)> _return,
                              ^
src/thrift/async/TAsyncProtocolProcessor.cpp:30:39: error: use of undeclared identifier 'stdcxx'
                                      stdcxx::shared_ptr<TBufferBase> ibuf,
                                      ^
src/thrift/async/TAsyncProtocolProcessor.cpp:31:39: error: use of undeclared identifier 'stdcxx'
                                      stdcxx::shared_ptr<TBufferBase> obuf) {
                                      ^
src/thrift/async/TAsyncProtocolProcessor.cpp:31:76: error: expected ';' after top level declarator
                                      stdcxx::shared_ptr<TBufferBase> obuf) {
                                                                           ^
                                                                           ;
5 errors generated.

这个错误出现的相当神奇,原始的错误信息是在 namespace 'apache::thrift' 找不到 stdcxx 一开始我以为stdcxx 是C++ 中的标准库,找不到的原因肯定是某些Lib没有安装到,后来发现不是,这个 stdcxx 不是开源的标准版本 stdcxx, 居然是 thrift 自己写的lib,就在 ./lib/cpp/src/thrift/stdcxx.h 然后下载了最新的 thrift-0.14.1 版本,源码里已经没有了这个lib。坑爹,肯定是在编译 0.12.0 源码的时候,使用了 0.14.1 的lib了。暴力的 brew uninstall thrift, 编译恢复正常~~
*

fatal error: ‘openssl/opensslv.h’ file not found

libtool: compile:  g++ -std=c++11 -DHAVE_CONFIG_H -I. -I../.. -I../../lib/cpp/src/thrift -I../../lib/c_glib/src/thrift -I/usr/local/include -I./src -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/usr/local/Cellar/thrift/0.14.1/include -Wall -Wextra -pedantic -g -O2 -MT src/thrift/transport/TSSLSocket.lo -MD -MP -MF src/thrift/transport/.deps/TSSLSocket.Tpo -c src/thrift/transport/TSSLSocket.cpp  -fno-common -DPIC -o src/thrift/transport/.libs/TSSLSocket.o
src/thrift/transport/TSSLSocket.cpp:43:10: fatal error: 'openssl/opensslv.h' file not found
#include <openssl/opensslv.h>
         ^~~~~~~~~~~~~~~~~~~~
1 error generated.
make[4]: *** [src/thrift/transport/TSSLSocket.lo] Error 1

在Mac安装的 openssl 需要配置环境变量,CPPFLAGS 才能引用到include 文件夹中的文件

For compilers to find [email protected] you may need to set:
  export LDFLAGS="-L/usr/local/opt/[email protected]/lib"
  export CPPFLAGS="-I/usr/local/opt/[email protected]/include"

composer: No such file or directory

Making all in php
Making all in test
composer install --working-dir=../../..
make[4]: composer: No such file or directory
make[4]: *** [deps] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

因为thrift编译的时候需要用到PHP中的 composer 工具,所以需要 brew install composer

Composer: Composer is a tool for dependency management in PHP. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.

go.mod file not found in current directory or any parent

make[4]: Nothing to be done for `all-am'.
Making all in go
Making all in .
GOPATH=`pwd` /usr/local/bin/go build ./thrift
go: go.mod file not found in current directory or any parent directory; see 'go help modules'
make[4]: *** [all-local] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

这个我好像是执行了一次 go mod init 就好了,具体原因不清楚

GOPATH=`pwd` /usr/local/bin/go build ./thrift
go: inconsistent vendoring in /Users/wakun/Applications/thrift-0.12.0:
	github.com/golang/[email protected]: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt

	To ignore the vendor directory, use -mod=readonly or -mod=mod.
	To sync the vendor directory, run:
		go mod vendor

error: could not create ‘/usr/lib/python3.8’: Operation not permitted

/Library/Developer/CommandLineTools/usr/bin/make  install-exec-hook
/Users/wakun/opt/anaconda3/bin/python setup.py install --root=/ --prefix=/usr
running install
running build
running build_py
running build_ext
running install_lib
creating /usr/lib/python3.8
error: could not create '/usr/lib/python3.8': Operation not permitted
make[4]: *** [install-exec-hook] Error 1
make[3]: *** [install-exec-am] Error 2
make[2]: *** [install-am] Error 2
make[1]: *** [install-recursive] Error 1
make: *** [install-recursive] Error 1

编译好的python包默认安装包目录是 /usr, 我们需要把安装包安装到我们需要的目录下

./configure --without-ruby PY_PREFIX=/Users/wakun/opt/anaconda3/lib/python3.8

https://cwiki.apache.org/confluence/display/THRIFT/ThriftInstallation

Please be aware that the Python library will ignore the --prefix option and just install wherever Python’s distutils puts it (usually along the lines of /usr/lib/pythonX.Y/site-packages/). If you need to control where the Python modules are installed, set the PY_PREFIX variable. (DESTDIR is respected for Python and C++.)

‘/usr/lib/php/TMultiplexedProcessor.php’: Operation not permitted

 /usr/local/bin/ginstall -c -m 644 lib/TMultiplexedProcessor.php '/usr/lib/php/'
ginstall: cannot create regular file '/usr/lib/php/TMultiplexedProcessor.php': Operation not permitted
make[4]: *** [install-phpDATA] Error 1
make[3]: *** [install-am] Error 2
make[2]: *** [install-recursive] Error 1
make[1]: *** [install-recursive] Error 1
make: *** [install-recursive] Error 1

这个错误同上,需要设置 PHP_PREFIX 参数来解决

parquet-mr 项目编译

编译命令: mvn clean install -DskipTests=true

项目导入 Idea

parquet-mr 项目还是相当奇葩的,里面的部分Java代码是通过其他代码生成的,所以直接将项目导入IDEA中的时候,是看不了的。需要把下面这些代码加入到CLASSPATH中

$ find . -name generated-src                      
./parquet-common/target/generated-src
./parquet-encoding/target/generated-src
./parquet-column/target/generated-src

Pom依赖的Scope修改

另外当前 1.11.1 version版本的代码, parquet-hive-storage-handler 模块下编译会报错,这个模块下的main下代码,引用了其他模块下的代码, scope却是test, 这个也是有些不走寻常路~~

<!-- we have to include to the version specific binding for tests
         since the packaging phase occurs after the test phase  -->
    <dependency>
      <groupId>org.apache.parquet</groupId>
      <artifactId>parquet-hive-binding-factory</artifactId>
      <version>${project.version}</version>
<!--      <scope>test</scope>-->
    </dependency>
    <dependency>
      <groupId>org.apache.parquet</groupId>
      <artifactId>parquet-hive-binding-interface</artifactId>
      <version>${project.version}</version>
<!--      <scope>test</scope>-->
    </dependency>

Dev and debug parquet-tools

Enable Local Profile

在进行Local Debug Parquet-tools 的时候,默认会报错

java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

因为项目中有一个 hadoop.scope properties 属性,默认为provided,开启项目local profile,此时 hadoop.scope = compile, 这样就可以加载到Hadoop的包了

Fast debug parquet-tools

为了方便Debug parquet-tools, 修改 parquet-tools/src/main/java/org/apache/parquet/tools/Main.java 如下

public static void main(String[] args2) {
    String[] args = {"meta", "/Users/wakun/Downloads/part-00000-93fb1ee1-5d40-4881-ba50-be3d0db8431a-c000.snappy.parquet"};
    Main.out = System.out;

参考文档

thrift mac os 安装

猜你喜欢

转载自blog.csdn.net/wankunde/article/details/119978555