首先批评一下:淘宝开源了数据抽取的工具datax,但是他在开源之后后续技术支持简直惨不忍睹。技术文档完全不是一个业内技术航母应有的范儿,文档水准简直业余。个人觉得,你既然支持开源,起码也要尊重开源,随便搞了一个文档就想糊弄,要知道这个使用者带来很大问题,并花费很多时间。
在RHEL 6.2 rpmbulid datax源码包,出现报错信息:
[root@Hadoop rpm]# rpmbuild -ba t_dp_datax_engine.spec 。。。。。 Processing files: t_dp_datax_engine-1.0.0-1.noarch error: File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/bin error: File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/conf error: File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/engine error: File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/common error: File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/libs error: File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/logs error: File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/jobs RPM build errors: File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/bin File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/conf File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/engine File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/common File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/libs File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/logs File not found: /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64/home/taobao/datax/jobs
我知道这是rhel 6.2的rpmbuild的buildroot的目录和rhel 5不一样。上述问题的解决方法:
修改t_dp_datax_engine.spec。
[root@Hadoop rpm]# cat t_dp_datax_engine.spec summary: engine provides core scheduler and data swap storage for DataX Name: t_dp_datax_engine Version: 1.0.0 Release: 1 Group: System License: GPL AutoReqProv: no BuildArch: noarch %define dataxpath /home/taobao/datax //改成%{buildroot}/home/taobao/datax %define vdataxpath /home/taobao/datax //添加,其中vdataxpath下面要用 %description DataX Engine provides core scheduler and data swap storage for DataX %prep cd ${OLDPWD}/../ export LANG=zh_CN.UTF-8 ant dist %build %install dos2unix ${OLDPWD}/../release/datax.py mkdir -p %{dataxpath}/bin mkdir -p %{dataxpath}/conf mkdir -p %{dataxpath}/engine mkdir -p %{dataxpath}/common mkdir -p %{dataxpath}/libs mkdir -p %{dataxpath}/jobs mkdir -p %{dataxpath}/logs cp ${OLDPWD}/../jobs/sample/*.xml %{dataxpath}/jobs cp ${OLDPWD}/../release/*.py %{dataxpath}/bin/ cp -r ${OLDPWD}/../conf/*.properties %{dataxpath}/conf cp -r ${OLDPWD}/../conf/*.xml %{dataxpath}/conf cp -r ${OLDPWD}/../build/engine/*.jar %{dataxpath}/engine cp -r ${OLDPWD}/../build/common/*.jar %{dataxpath}/common cp ${OLDPWD}/../c++/build/libcommon.so %{dataxpath}/common cp -r ${OLDPWD}/../libs/commons-io-2.0.1.jar %{dataxpath}/libs cp -r ${OLDPWD}/../libs/commons-lang-2.4.jar %{dataxpath}/libs cp -r ${OLDPWD}/../libs/dom4j-2.0.0-ALPHA-2.jar %{dataxpath}/libs cp -r ${OLDPWD}/../libs/jaxen-1.1-beta-6.jar %{dataxpath}/libs cp -r ${OLDPWD}/../libs/junit-4.4.jar %{dataxpath}/libs cp -r ${OLDPWD}/../libs/log4j-1.2.16.jar %{dataxpath}/libs cp -r ${OLDPWD}/../libs/slf4j-api-1.4.3.jar %{dataxpath}/libs cp -r ${OLDPWD}/../libs/slf4j-log4j12-1.4.3.jar %{dataxpath}/libs %post chmod -R 0777 %{dataxpath}/jobs //改成chmod -R 0777 %{vdataxpath}/jobs chmod -R 0777 %{dataxpath}/logs //改成chmod -R 0777 %{vdataxpath}/logs %files %defattr(0755,root,root) %{dataxpath}/bin // 改成%{vdataxpath}/bin %{dataxpath}/conf //改成%{vdataxpath}/conf %{dataxpath}/engine //改成%{vdataxpath}/engine %{dataxpath}/common //改成%{vdataxpath}/common %{dataxpath}/libs //改成%{vdataxpath}/libs %attr(0777,root,root) %dir %{dataxpath}/logs //改成%attr(0777,root,root) %{vdataxpath}/logs %attr(0777,root,root) %dir %{dataxpath}/jobs //改成 %attr(0777,root,root) %{vdataxpath}/jobs %changelog * Fri Aug 20 2010 meining - Version 1.0.0
编译结果:
[root@Hadoop rpm]# rpmbuild -ba t_dp_datax_engine.spec 。。。 Processing files: t_dp_datax_engine-1.0.0-1.noarch Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64 Wrote: /root/rpmbuild/SRPMS/t_dp_datax_engine-1.0.0-1.src.rpm Wrote: /root/rpmbuild/RPMS/noarch/t_dp_datax_engine-1.0.0-1.noarch.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.y3UwSl + umask 022 + cd /root/rpmbuild/BUILD + /bin/rm -rf /root/rpmbuild/BUILDROOT/t_dp_datax_engine-1.0.0-1.x86_64 + exit 0
至于rpmbuil的SPEC文件的定义参数含义,可以参考:
http://fedoraproject.org/wiki/How_to_create_an_RPM_package#.25install_section
以上是我解决的问题。还有一个我没能解决的问题是:
编译符合我环境的oraclewriter。
datax的文档是怎么写的:
由于Oracle最终通过jni调用oci机制导入数据, 默认情况下,DataX提供基于Intel x86_64 liboraclewriter.so包。如果你使用的平台硬件和默认情况不匹配,需要编译oraclewriter C++代码,liboraclewriter.so需要libiconv库的支持,请先检查该库是否存在。具体编译过程如下:
1) 进入 DataX源码中的c++/src/oracledumper/src/ 目录
2) 输入make命令即可编译
3) 将当前目录下编译完成的liboraclewriter.so 拷贝至/home/taobao/datax/plugins/目录下,覆盖默认的liboraclewriter.so即可
在rhel5.4的基础上面我进行了编译,虽然也是OK的,但是其中需要注意的是:
*这一步是要在oracle软件安装的那一台机器上上面去编译,并且环境变量中指定ORACLE_HOME
*要指明LD_LIBRARY_PATH
*其中还要安装libiconv库。
但是在我rhel6.2的上面上,怎么都不编译成功。
我做了以下修改trunk/c++/src/oracledumper/src/Makefile,修改的原因他是找不到jni.h.
修改如下:
[root@Hadoop src]# cat Makefile INCLUDE=-I../include //添加 -I${ORACLE_HOME}/jdk/include -I${ORACLE_HOME}/jdk/include/linux LIBS=-lclntsh -liconv -L../lib -L${ORACLE_HOME}/lib -L../../../../libs/ CC=g++ OBJS=liboraclewriter.so CFLAGS=-shared -fPIC -Wl,-rpath=/home/taobao/datax/libs CPP=common.cpp dumper.cpp oradumper.cpp strsplit.cpp com_taobao_datax_plugins_writer_oraclewriter_OracleWriterJni.cpp OBJS: $(CPP) $(CC) $(INCLUDE) -o $(OBJS) $(CPP) $(CFLAGS) $(LIBS) clean: rm -rf $(OBJS)
可是最终我还是卡在这里了:
[root@Hadoop src]# make g++ -I../include -I/home/oracle/database/product/10.2.0/db_1/jdk/include -I/home/oracle/database/product/10.2.0/db_1/jdk/include/linux -o liboraclewriter.so common.cpp dumper.cpp oradumper.cpp strsplit.cpp com_taobao_datax_plugins_writer_oraclewriter_OracleWriterJni.cpp -shared -fPIC -Wl,-rpath=/home/taobao/datax/libs -lclntsh -liconv -L../lib -L/home/oracle/database/product/10.2.0/db_1/lib -L../../../../libs/ oradumper.cpp: In member function ?.irtual void OraDumper::RunDump(const char*)?. oradumper.cpp:305: error: invalid conversion from ?.onst char*?.to ?.har*? oradumper.cpp:312: warning: deprecated conversion from string constant to ?.har*? oradumper.cpp:314: warning: deprecated conversion from string constant to ?.har*? make: *** [OBJS] Error 1
这我就无能无力了,cpp代码不是我强项。
现在只能希望在rhel5.4的版本生成的liboraclewriter.so在rhel6.2上面能用。不然就得联系淘宝的开发人员了。
我的总结,淘宝datax对目前rhel6的支持还不够,希望跟他们联系了,能解决这个问题,晚上这个很好开源软件。
最后还是对淘宝表示敬意,希望各方面都做的像一个领导者。