Retrieved from: http://www.geedoo.info/installed-on-the-cloudera-hadoop-cdh-r-and-rhadoop-rhdfs-rmr2-rhbase-rhive.html
Preface: RHadoop is an open source project initiated by Revolution Analytics, which combines the statistical language R with Hadoop. At present, the project includes three R packages, namely rmr, which supports writing MapReduce applications in R, rhdfs, which is used for R language to access HDFS, and rhbase, which is used for R language to access HBASE.
1. System and required software version
Server OS: CentOS 6.3
R language version: R-2.15.3 (I used the latest version of R-3 before, and found that the new version has various incompatibility problems, so I chose the latest version of R-2)
Download address: http://ftp.ctex.org/mirrors/CRAN/src/base/R-2/R-2.15.3.tar.gz
Cloudera Hadoop CDH version: 4.4.0
JDK version: 1.6.0_31
Use the cloudera-manager-installer.bin installation package of the free version of Cloudera Manager to complete the installation of CDH and JDK. For details, see CDH installation.
Download address: https://ccp.cloudera.com/display/SUPPORT/Cloudera+Manager+Free+Edition+Download
rJava (java can call R and can be installed using CRAN) version: rJava_0.9-5
Download address: http://www.rforge.net/src/contrib/rJava_0.9-5.tar.gz
RHadoop version, the latest official version, the project address ( https://github.com/RevolutionAnalytics ), including the projects are as follows:
Download address: https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads
Documentation: https://github.com/RevolutionAnalytics/RHadoop/wiki
2. Dependency installation (R language package, rJava package)
Before installation, you need to install the R language package and rJava package one by one on each host of the cluster, and then install Rhadoop. The specific installation steps are as follows:
1. Install the R language package
Before compiling R, you need to install the following programs through yum:
# yum install gcc-gfortran
Otherwise report "configure: error: No F77 compiler found" error
# yum install gcc gcc-c++
Otherwise report "configure: error: C++ preprocessor "/lib/cpp" fails sanity check" error
# yum install readline-devel
否则报”–with-readline=yes (default) and headers/libs are not available”错误
# yum install libXt-devel
否则报”configure: error: –with-x=yes (default) and X11 headers/libs are not available”错误
Then download the source code, compile
# wget http://cran.rstudio.com/src/base/R-2/R-2.15.3.tar.gz
# tar -zxvf R-2.15.3.tar.gz
# cd R-2.15.3
# ./configure –prefix=/usr –disable-nls –enable-R-shlib/** (the last two options –disable-nls –enable-R-shlib are for the RHive mount, if you don’t install RHive you can omit) */
# make
# make install