Spark 2017 BigData Update(1)ENV on Spark 2.2.1 with Zeppelin on Local

Spark 2017 BigData Update(1)ENV on Spark 2.2.1 with Zeppelin on Local

Java Version
>java -version
java version "1.8.0_121"

Maven Version
>mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:47-06:00)

Protoc Version
>protoc --version
libprotoc 2.5.0

Currently Spark is with hadoop 2.7 version, so I plan to use these, Install hadoop 2.7.5
http://mirrors.ocf.berkeley.edu/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5-src.tar.gz

Prepare CMake ENV on MAC
https://cmake.org/install/
>wget https://cmake.org/files/v3.10/cmake-3.10.1.tar.gz
unzip and go to that working directory
>./bootstrap
>make
>sudo make install
>cmake --version
cmake version 3.10.1

Unzip the source of hadoop try to build that
>mvn package -Pdist,native -DskipTests -Dtar

Still I can not build that on my MAC, so it is fine. I will use the binary as well.
Download the binary
>wget http://apache.osuosl.org/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
Unzip the file and move to work directory
>sudo ln -s /Users/carl/tool/hadoop-2.7.5 /opt/hadoop-2.7.5
Prepare the Configuration file
>cat etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

>cat etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Format the file system
>hdfs namenode -format

Generate the key to access localhost
>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
>cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Still have issue saying connection refused
>ssh localhost
ssh: connect to host localhost port 22: Connection refused
Solution:
https://bluishcoder.co.nz/articles/mac-ssh.html
Open System Reference —> Sharing —> Remote Login
Not work on Mac OS.

HDFS, but I need to type password during the process
>sbin/start-dfs.sh

Visit the webpage
http://localhost:50070/dfshealth.html#tab-overview

YARN
>sbin/start-yarn.sh

Visit the page
http://localhost:8088/cluster

Install Spark
>wget http://apache.spinellicreations.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
unzip and place in the working directory
>sudo ln -s /Users/carl/tool/spark-2.2.1 /opt/spark-2.2.1

Prepare Configuration File
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

>echo $SPARK_HOME
/opt/spark

Start the Spark Shell
>MASTER=yarn-client bin/spark-shell
yarn-client is changed after 2.0, I will use this instead
>MASTER=yarn bin/spark-shell

It stuck there for a while, maybe because of some stuck tasks, let me kill them
>bin/yarn application -kill application_1514320285035_0001

Install Zeppelin
https://zeppelin.apache.org/docs/0.7.3/install/install.html#installation
Download binary
>wget http://apache.mirrors.tds.net/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
Place the file in directory
>sudo ln -s /Users/carl/tool/zeppelin-0.7.3 /opt/zeppelin-0.7.3
Prepare conf
>cat conf/zeppelin-env.sh
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"

Start the notebook
>bin/zeppelin-daemon.sh start
stop the notebook
>bin/zeppelin-daemon.sh stop

Visit the webpage
http://localhost:8080/#/

You can see the task here as well
http://localhost:4040/stages/

spark.master is ‘local

’, that is why it runs on local machine, not on remote YARN, we can easily change that in the setting page

References:
http://sillycat.iteye.com/blog/2286997
http://sillycat.iteye.com/blog/2288141
http://sillycat.iteye.com/blog/2405873

https://spark.apache.org/docs/latest/
https://zeppelin.apache.org/docs/0.7.3/install/install.html#installation

Spark 2017 BigData Update(1)ENV on Spark 2.2.1 with Zeppelin on Local

猜你喜欢