PredictionIo 0.12.1 安装详解

PredictionIo 0.12.1 安装简介
参考网站:http://predictionio.apache.org/install/
环境:
系统环境:

Ubuntu 14.04

软件版本:
以下为安装测试过程中,使用的软件版本:
必须安装:
Java:64-Bit “1.8.0_171”
Hadoop : 2.7.6
Scala : 2.12.6
Spark :2.1.1(支持 hadoop 2.7 的 spark-2.1.1-bin-hadoop2.7 版本)
选择性安装(三选一),(此次测试安装 3 选择存储):
1:PostgreSQL 9.1
2:MySQL 5.1
3:Apache HBase 1.2.6
Elasticsearch 5.5.2
软件版本选择:
Scala 2.10.x, 2.11.x
Spark 1.6.x, 2.0.x, 2.1.x
Hadoop 2.4.x to 2.7.x
Elasticsearch 1.7.x, 5.x(PredictionIo 0.11.0 版本以上,选择5.x)

安装Java
参考网站:http://www.runoob.com/java/java-environment-setup.html

安装过程:
1:将 jdk-8-64.tar.gz 安装包移动至 /usr/local/java 目录
2:解压 jdk-8-64.tar.gz 至当前目录 ;
命令:” tar zxvf jdk-8-64.tar.gz ”;

安装路径:
/usr/local/java/jdk1.8.0_171

配置:
文件 /root/.bashrc 增加如下:

export JAVA_HOME=/usr/local/java/jdk1.8.0_171
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export PATH=${JAVA_HOME}/bin:$PATH

检查是否安装成功:
输入命令 “java -version”输出版本信息,安装成功;

安装Scala
参考网站:http://www.runoob.com/scala/scala-install.html

安装过程:
1:将scala-2.12.6.tgz 安装包移动至 /usr/local/scala 目录
2:解压 scala-2.12.6.tgz 至当前目录

安装路径:
/usr/local/scala/scala-2.12.6

配置:
文件 /root/.bashrc 增加如下:

扫描二维码关注公众号,回复: 2519968 查看本文章
export SCALA_PATH=/usr/local/scala/scala-2.12.6
export PATH=${JAVA_HOME}/bin:$SCALA_APTH:$PATH

检查是否安装成功:
输入命令“scalac -version”输出版本信息,安装成功;

安装 Hadoop
参考网站:https://blog.csdn.net/wee_mita/article/details/52750112
https://www.cnblogs.com/xzjf/p/7231519.html
http://hadoop.apache.org/releases.html

描述:
测试安装Hadoop 是单机模式,可扩展安装成分布式集群方式;

安装过程:
1:将 hadoop-2.7.6.tar.gz 安装包移动至 /usr/local/hadoop 目录
2:解压 hadoop-2.7.6.tar.gz 至当前目录;

安装路径:
/usr/local/hadoop/hadoop-2.7.6

配置:
1:文件 /root/.bashrc 增加如下:

export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.6
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=${JAVA_HOME}/bin:$SCALA_PATH/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME:$SCALA_APTH:$PATH

2:文件 /usr/local/hadoop/hadoop-2.7.6/etc/hadoop/core-site.xml 添加如下内容:

<configuration>
//增加部分 start
    <property>
        <name>fs.default.name</name>
        <value>hdfs://{$HOST_NAME}:9000</value>
    </property>
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hdfs</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/tmp</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>fs.checkpoint.period</name>
        <value>3600</value>
    </property>
    <property>
        <name>fs.checkpoint.size</name>
        <value>67108864</value>
    </property>
//增加部分 end
</configuration>

3:文件 /usr/local/hadoop/hadoop-2.7.6/etc/hadoop/hdfs-site.xml 添加如下内容:

<configuration>
//增加部分 start
    <property>
        <name>dfs.replication</name>             
        <value>1</value>                    
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
         <name>dfs.namenode.name.dir</name>
         <value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/namenode</value>
    </property>
    <property>
         <name>fs.checkpoint.dir</name>
         <value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/secondarynamenode</value>
    </property>
    <property>
         <name>fs.checkpoint.edits.dir</name>
         <value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/secondarynamenode </value>
    </property>
    <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/datanode</value>
    </property>
    <property>
         <name>dfs.namenode.http-address</name>
         <value>{$HOST_NAME}:50070</value>
    </property>
    <property>
          <name>dfs.namenode.secondary.http-address</name>
          <value>{$HOST_NAME}:50090</value>
    </property>
    <property>
          <name>dfs.webhdfs.enabled</name>
          <value>true</value>
    </property>
//增加部分 end
</configuration>

4:文件 /usr/local/hadoop/hadoop-2.7.6/etc/hadoop/mapred-site.xml 添加如下内容:

<configuration>
//增加部分 start
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>    
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>{$HOST_NAME}:10020</value>  
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>{$HOST_NAME}:19888</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.staging-dir</name>
        <value>/mapreduce</value>                  
    </property>
//增加部分 end
</configuration>

5:文件 /usr/local/hadoop/hadoop-2.7.6/etc/hadoop/yarn-site.xml 添加如下内容:

<configuration>
//增加部分 start
    <property>
        <name>yarn.web-proxy.address</name>
        <value>yarn_proxy:YARN_PROXY_PORT</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>{$HOST_NAME}:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>{$HOST_NAME}:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>{$HOST_NAME}:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>{$HOST_NAME}:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>{$HOST_NAME}:8080</value>
    </property>
    <property>
        <name> mapreduce.job.ubertask.enable</name>
        <value>true</value>
    </property>
//增加部分 end
</configuration>

6:打开/usr/local/hadoop/hadoop-2.7.6/etc/hadoop/slaves文件,添加作为slave的主机名,一行一个:

{$HOST_NAME}

7:打开${HADOOP_HOME}/etc/hadoop/masters文件,添加作为secondarynamenode的主机名,一行一个;

8:修改文件/usr/local/hadoop/hadoop-2.7.6/etc/hadoop/hadoop-env.sh参数如下:

HADOOP_HEAPSIZE=500
HADOOP_NAMENODE_INIT_HEAPSIZE=500

9:修改文件/usr/local/hadoop/hadoop-2.7.6/etc/hadoop/mapred-env.sh参数如下:

HADOOP_JOB_HISTORYSERVER_HEAPSIZE=250

10:修改文件/usr/local/hadoop/hadoop-2.7.6/etc/hadoop/yarn-env.sh参数如下:

JAVA_HEAP_MAX=Xmx500m
YANR_HEAPSIZE=500

关闭防火墙:

sudoufw disable
serviceiptables stop / start
serviceiptables status

初次运行Hadoop时一定要有如下操作:

cd /usr/local/hadoop/hadoop-2.7.6
bin/hdfs namenode -format 

启动hadoop 服务:
cd /usr/local/hadoop/hadoop-2.7.6/sbin
开启:“./start-all.sh
关闭:“./stop-all.sh

检测hadoop是否启动成功:
输入命令“ jps -l ”,查看是否存在如下信息:

访问界面地址:
http://192.168.23.131:50070/

安装 Prediction 0.12.1
软件包下载地址:
https://www.apache.org/dyn/closer.cgi/predictionio/0.12.1/apache-predictionio-0.12.1.tar.gz

安装路径:
/home/PredictionIo

安装过程:
1:将 PredictionIO-0.12.1.tar.gz 安装包移动至 /home/PredictionIo目录;
2:解压 PredictionIO-0.12.1.tar.gz 至当前目录
3:进入 PredictionIo 目录,运行 “./make-distribution.sh”文件
c d P r e d i c t i o n I o ./make-distribution.sh (此过程需要一段时间,请耐心等待)
4:成功会新建目录及文件

PredictionIo/sbt/sbt
PredictionIo/conf/
PredictionIo/conf/pio-env.sh

5:在安装目录/home/PredictionIo 新建文件夹 vendors

$ mkdir PredictionIo /vendors

配置:
文件 /root/.bashrc 增加如下:

export PIO_HOME=/home/PredictionIo
export PATH=${JAVA_HOME}/bin:$PIO_HOME:$PIO_HOME/bin:$SCALA_PATH/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME:$SCALA_APTH:$PATH

安装Spark
安装目录
/home/PredictionIo/vendors/spark-2.1.1-bin-hadoop2.7

安装过程
1:下载软件包至/home/PredictionIo/vendors安装目录

$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.6.tgz

2:解压软件包至安装目录

$ tar zxvfC spark-2.1.1-bin-hadoop2.6.tgz PredictionIo/vendors

配置:
1:文件 /root/.bashrc 增加如下:

export SPARK_HOME=/home/PredictionIo/vendors/spark-2.1.1-bin-hadoop2.7
export PATH=${JAVA_HOME}/bin:$PIO_HOME:$PIO_HOME/bin:$SCALA_PATH/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME:$SPARK_HOME:$SCALA_APTH:$PATH

2:编辑Spark配置文件/conf/spark-env.sh添加如下设置

export HADOOP_CONF_DIR={$HADOOP_PATH}/etc/hadoop
export HADOOP_HOME={$HADOOP_PATH}
export JAVA_HOME={$JAVA_PATH}
export SCALA_HOME={$SCALA_PATH}
export SPARK_WORK_MEMORY=3g
export SPARK_MASTER_HOST={$HOST_NAME}
export SPARK_MASTER_IP={$HOST_NAME}
export MASTER=spark://{$HOST_NAME}:7077
export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=4"

访问界面地址:
http://192.168.23.131:8080/

安装Elasticsearch
安装目录
/home/PredictionIo/vendors/elasticsearch-5.5.2

安装过程
1:下载软件包至/home/PredictionIo/vendors安装目录
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.2.tar.gz
2:解压软件包至安装目录
$ tar zxvfC elasticsearch-5.5.2.tar.gz PredictionIo/vendors

检测Elasticsearch是否安装成功:
1:注意,启动Elasticsearch 不能使用“root”用户,新建用户“elastic”、密码“elastic”,并设置用户“elastic”管理员“root”权限;
2:切换用户“elastic”
3:进入Elasticsearch安装目录bin,运行“./elasticsearch”启动服务,输出如下信息则安装成功;

[2018-08-03T15:43:05,783][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [aggs-matrix-stats]
[2018-08-03T15:43:05,783][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [ingest-common]
[2018-08-03T15:43:05,784][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [lang-expression]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [lang-groovy]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [lang-mustache]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [lang-painless]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [parent-join]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [percolator]
[2018-08-03T15:43:05,790][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [reindex]
[2018-08-03T15:43:05,790][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [transport-netty3]
[2018-08-03T15:43:05,790][INFO ][o.e.p.PluginsService     ] [i8eVyaH] loaded module [transport-netty4]
[2018-08-03T15:43:05,790][INFO ][o.e.p.PluginsService     ] [i8eVyaH] no plugins loaded
[2018-08-03T15:43:11,281][INFO ][o.e.d.DiscoveryModule    ] [i8eVyaH] using discovery type [zen]
[2018-08-03T15:43:12,013][INFO ][o.e.n.Node               ] initialized
[2018-08-03T15:43:12,013][INFO ][o.e.n.Node               ] [i8eVyaH] starting ...
[2018-08-03T15:43:12,261][INFO ][o.e.t.TransportService   ] [i8eVyaH] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2018-08-03T15:43:12,273][WARN ][o.e.b.BootstrapChecks    ] [i8eVyaH] max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2018-08-03T15:43:15,366][INFO ][o.e.c.s.ClusterService   ] [i8eVyaH] new_master {i8eVyaH}{i8eVyaHsQwKynitriABD1Q}{dz63krojSnivRerG3RROZQ}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2018-08-03T15:43:15,436][INFO ][o.e.h.n.Netty4HttpServerTransport] [i8eVyaH] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2018-08-03T15:43:15,436][INFO ][o.e.n.Node               ] [i8eVyaH] started
[2018-08-03T15:43:15,792][INFO ][o.e.g.GatewayService     ] [i8eVyaH] recovered [1] indices into cluster_state
[2018-08-03T15:43:16,354][INFO ][o.e.c.r.a.AllocationService] [i8eVyaH] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[pio_meta][4]] ...]).

安装HBase
软件包下载地址:
http://www.apache.org/dyn/closer.cgi/hbase/1.2.6/hbase-1.2.6-bin.tar.gz

安装目录
/home/PredictionIo/vendors/hbase-1.2.6

安装过程
1:下载软件包至/home/PredictionIo/vendors安装目录

$ wget http://archive.apache.org/dist/hbase/1.2.6/hbase-1.2.6-bin.tar.gz

2:解压软件包至安装目录

$ tar zxvfC hbase-1.2.6-bin.tar.gz PredictionIo/vendors

配置:
1:编辑HBase 配置文件 /conf/hbase-site.xml 添加如下内容:

<configuration>
//新增内容 star
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/PredictionIo/vendors/hbase-1.2.6/data</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/PredictionIo/vendors/hbase-1.2.6/zookeeper</value>
  </property>
//新增内容 end
</configuration>

2:编辑HBase配置文件/conf/hbase-env.sh添加如下设置:

export JAVA_HOME={$JAVA_PATH}

访问界面地址:
http://192.168.23.131:16010/http://192.168.23.131:60010/

安装PredictionIo 配置
1:进入PredictionIo安装目录/home/PredictionIo
2:编辑文件 /conf/pio-env.sh 配置信息如下:

SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.7
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS

PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=127.0.0.1
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
IO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.5.2
PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=elastic
PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=elastic

PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
PIO_STORAGE_SOURCES_HBASE_HOSTS=127.0.0.1
PIO_STORAGE_SOURCES_HBASE_PORTS=7070 

PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
 如果安装的是MySql,配置信息如下:
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.44-bin.jar
SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.7
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL

PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://{$MYSQL_URL}:3306/{$MYSQL_DBNAME}
PIO_STORAGE_SOURCES_MYSQL_USERNAME={$MYSQL_USERNAME}
PIO_STORAGE_SOURCES_MYSQL_PASSWORD={$MYSQL_PASSWORD}

启动PredictionIo服务
1:如果选择安装Elasticsearch 和 HBase:
开启:pio-start-all
关闭:pio-stop-all
2:如果选择安装MySql:
启动:pio eventserver &

检查PredictionIo安装是否成功:
输入命令“pio status”,输出以下内容,则安装成功;

[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.1 is installed at /home/PredictionIo
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /home/PredictionIo/vendors/spark-2.1.1-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [HBLEvents] The table pio_event:events_0 doesn't exist yet. Creating now...
[INFO] [HBLEvents] Removing table pio_event:events_0...
[INFO] [Management$] Your system is all ready to go.

启动Hadoop、HBase、Elasticsearch、Spark、Prediction所有服务:
输入命令“jps -l”,输出以下内容:

4512 org.apache.predictionio.tools.console.Console
6705 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
9586 org.apache.spark.deploy.SparkSubmit
10948 org.elasticsearch.bootstrap.Elasticsearch
3780 org.apache.spark.deploy.worker.Worker
10870 org.apache.predictionio.tools.console.Console
9673 org.apache.spark.executor.CoarseGrainedExecutorBackend
6282 org.apache.hadoop.hdfs.server.namenode.NameNode
9515 org.apache.predictionio.tools.console.Console
3627 org.apache.spark.deploy.master.Master
6443 org.apache.hadoop.hdfs.server.datanode.DataNode
12894 sun.tools.jps.Jps

猜你喜欢

转载自blog.csdn.net/weixin_42082627/article/details/81391190