Spark/Hadoop/Zeppelin Upgrade(1)

Spark/Hadoop/Zeppelin Upgrade(1)

1 Install JDK1.8 Manually
> wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u77-b03/jdk-8u77-linux-x64.tar.gz"

Unzip and place it in right place. Add bin to PATH.
> java -version
java version "1.8.0_77"

2 MAVEN Installation
http://sillycat.iteye.com/blog/2193762

> wget http://apache.arvixe.com/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

Unzip and place it in the right place, add bin to PATH
> mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:47-06:00)
Maven home: /opt/maven
Java version: 1.8.0_77, vendor: Oracle Corporation

3 Protoc Installation
> git clone https://github.com/google/protobuf.git

> sudo apt-get install unzip

> sudo apt-get install autoconf

> sudo apt-get install build-essential libtool

configure make and make install, adding to the PATH.
> protoc --version
libprotoc 3.0.0

Error Exception:
'libprotoc 3.0.0', expected version is '2.5.0'

Solution:
Switch to 2.5.0
> git checkout tags/v2.5.0

> ./autogen.sh

> ./configure --prefix=/home/carl/tool/protobuf-2.5.0

> protoc --version
libprotoc 2.5.0

4 HADOOP Installation
> wget http://mirrors.ibiblio.org/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2-src.tar.gz

> mvn package -Pdist,native -DskipTests -Dtar

Error Message:
Cannot run program "cmake"

Solution:
> sudo apt-get install cmake

Error Message:
An Ant BuildException has occured: exec returned: 1

Solution:
Try to get more detail
> mvn package -Pdist,native -DskipTests -Dtar -e

> mvn package -Pdist,native -DskipTests -Dtar -X

> sudo apt-get install zlib1g-dev

> sudo apt-get install libssl-dev

But it is not working.

So, switch to use the binary instead.
> wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

http://sillycat.iteye.com/blog/2193762

http://sillycat.iteye.com/blog/2090186

Configure the JAVA_HOME
export JAVA_HOME="/opt/jdk"

PATH="/opt/hadoop/bin:$PATH"

Format the node with command
> hdfs namenode -format

Set up SSH on ubuntu-master, ubuntu-dev1, ubuntu-dev2
> ssh-keygen -t rsa

> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Find the configuration file /opt/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME="/opt/jdk"

Follow documents and make the configurations.
http://sillycat.iteye.com/blog/2090186

Command to start DFS
> sbin/start-dfs.sh

Error Message:
java.io.IOException: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
        at org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:875)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1125)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:428)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2370)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2257)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2304)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2481)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2505)

Solution:
Configure the same xml in master as well.
Change the slaves file to point to ubuntu-dev1 and ubuntu-dev2.

Error Message:
2016-03-28 13:31:14,371 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/carl/tool/hadoop-2.7.2/dfs/name is in an inconsistent state: storage directory doe
s not exist or is not accessible.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)

Solution:
Make sure we have have the DFS directory
> mkdir -p /opt/hadoop/dfs/data

> mkdir -p /opt/hadoop/dfs/name

Check if the DFS is running
> jps
2038 SecondaryNameNode
1816 NameNode
2169 Jps

Visit the console page:
http://ubuntu-master:50070/dfshealth.html#tab-overview

Error Message:
2016-03-28 14:20:16,180 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ubuntu-master/192.168.56.104:9000
2016-03-28 14:20:22,183 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ubuntu-master/192.168.56.104:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

> telnet ubuntu-master 9000
Trying 192.168.56.104...
telnet: Unable to connect to remote host: Connection refused

Solution:
I can talent that on ubuntu-master, but not on ubuntu-dev1 and ubuntu-dev2. I guess it is firewall problem.
>sudo ufw disable
Firewall stopped and disabled on system startup

Then I also delete the IPV6 related things in /etc/hosts.
> cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 ubuntu-dev2.ec2.internal

192.168.56.104   ubuntu-master
192.168.56.105   ubuntu-dev1
192.168.56.106   ubuntu-dev2
192.168.56.107   ubuntu-build

Start YARN cluster
> sbin/start-yarn.sh

http://ubuntu-master:8088/cluster

5 Spark Installation
http://sillycat.iteye.com/blog/2103457

download the latest spark version
> wget http://mirror.nexcess.net/apache/spark/spark-1.6.1/spark-1.6.1-bin-without-hadoop.tgz

Unzip and place in the right place.
http://spark.apache.org/docs/latest/running-on-yarn.html

> cat conf/spark-env.sh

HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

We can Build Spark
http://spark.apache.org/docs/latest/building-spark.html

> build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.2 -Phive -DskipTests clean package

[WARNING] The requested profile "hadoop-2.7" could not be activated because it does not exist.

That means we need to use hadoop 2.6.4

6 Install NodeJS
http://sillycat.iteye.com/blog/2284695

> wget https://nodejs.org/dist/v4.4.0/node-v4.4.0.tar.gz

> sudo ln -s /home/carl/tool/node-v4.4.0 /opt/node-v4.4.0

7 Zeppelin Installation
http://sillycat.iteye.com/blog/2216604

http://sillycat.iteye.com/blog/2223622

http://sillycat.iteye.com/blog/2242559

Check git version
> git --version
git version 1.9.1

Java Version
> java -version
java version "1.8.0_77"

Check nodeJS version
> node --version && npm --version
v4.4.0
2.14.20

Install dependencies
> sudo apt-get install libfontconfig

Check MAVEN
> mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:47-06:00)

Add MAVEN parameters
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"

> mvn clean package -DskipTests -Pspark-1.6 -Dspark.version=1.6.1 -Phadoop-2.6 -Dhadoop.version=2.6.

References:
http://sillycat.iteye.com/blog/2244147

http://sillycat.iteye.com/blog/2193762

zeppelin
https://github.com/apache/incubator-zeppelin/blob/master/README.md

Spark/Hadoop/Zeppelin Upgrade(1)

猜你喜欢