Java Version
>java -version
java version "1.8.0_121"
Maven Version
>mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:47-06:00)
Protoc Version
>protoc --version
libprotoc 2.5.0
Currently Spark is with hadoop 2.7 version, so I plan to use these, Install hadoop 2.7.5
http://mirrors.ocf.berkeley.edu/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5-src.tar.gz
Prepare CMake ENV on MAC
https://cmake.org/install/
>wget https://cmake.org/files/v3.10/cmake-3.10.1.tar.gz
unzip and go to that working directory
>./bootstrap
>make
>sudo make install
>cmake --version
cmake version 3.10.1
Unzip the source of hadoop try to build that
>mvn package -Pdist,native -DskipTests -Dtar
Still I can not build that on my MAC, so it is fine. I will use the binary as well.
Download the binary
>wget http://apache.osuosl.org/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
Unzip the file and move to work directory
>sudo ln -s /Users/carl/tool/hadoop-2.7.5 /opt/hadoop-2.7.5
Prepare the Configuration file
>cat etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
>cat etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format the file system
>hdfs namenode -format
Generate the key to access localhost
>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
>cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Still have issue saying connection refused
>ssh localhost
ssh: connect to host localhost port 22: Connection refused
Solution:
https://bluishcoder.co.nz/articles/mac-ssh.html
Open System Reference —> Sharing —> Remote Login
Not work on Mac OS.
HDFS, but I need to type password during the process
>sbin/start-dfs.sh
Visit the webpage
http://localhost:50070/dfshealth.html#tab-overview
YARN
>sbin/start-yarn.sh
Visit the page
http://localhost:8088/cluster
Install Spark
>wget http://apache.spinellicreations.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
unzip and place in the working directory
>sudo ln -s /Users/carl/tool/spark-2.2.1 /opt/spark-2.2.1
Prepare Configuration File
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
>echo $SPARK_HOME
/opt/spark
Start the Spark Shell
>MASTER=yarn-client bin/spark-shell
yarn-client is changed after 2.0, I will use this instead
>MASTER=yarn bin/spark-shell
It stuck there for a while, maybe because of some stuck tasks, let me kill them
>bin/yarn application -kill application_1514320285035_0001
Install Zeppelin
https://zeppelin.apache.org/docs/0.7.3/install/install.html#installation
Download binary
>wget http://apache.mirrors.tds.net/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
Place the file in directory
>sudo ln -s /Users/carl/tool/zeppelin-0.7.3 /opt/zeppelin-0.7.3
Prepare conf
>cat conf/zeppelin-env.sh
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
Start the notebook
>bin/zeppelin-daemon.sh start
stop the notebook
>bin/zeppelin-daemon.sh stop
Visit the webpage
http://localhost:8080/#/
You can see the task here as well
http://localhost:4040/stages/
spark.master is ‘local
References:
http://sillycat.iteye.com/blog/2286997
http://sillycat.iteye.com/blog/2288141
http://sillycat.iteye.com/blog/2405873
https://spark.apache.org/docs/latest/
https://zeppelin.apache.org/docs/0.7.3/install/install.html#installation