Hadoop big data platform is manually built-spark

Spark is a fast and general computing engine designed for large-scale data processing. It has the advantages of Hadoop MapReduce; but unlike MapReduce, the intermediate output results of Job can be stored in memory, so it is no longer necessary to read and write HDFS, so Spark can be better applied to MapReduce that requires iteration such as data mining and machine learning algorithm. Since spark uses dependency scala. So install it together.

 

1. Unzip the file

tar -zxvf /opt/spark-1.6.0-cdh5.8.0.tar.gz

tar -zxvf /opt/scala-2.10.4.tgz 

2. Configure environment variables

# vim /etc/profile

Add at the end of the file:

export SPARK_HOME=/opt/spark-1.6.0-cdh5.8.0

export SCALA_HOME=/opt/scala-2.10.4     

 

export PATH=.:$JAVA_HOME/bin:$SACLA_HOME/bin:$PATH //Add the scala path to the environment variable

3.  Configure spark-env.sh

     The spark-env.sh file configures some environments, dependencies, and resource configuration of the master and slave for the spark runtime.    

    cp conf/spark-env.sh.template conf/spark-env.sh //Copy spark-env.sh.template as spark-env.sh

 The configuration is as follows:

 

HADOOP_CONF_DIR=/opt/hadoop-2.6.0-cdh5.8.0/etc/hadoop

SPARK_LOCAL_IP=slave1 //This refers to the current running machine of spark

SPARK_MASTER_IP=master //Master node ip

SPARK_CLASSPATH=$CLASSPATH:`find /opt/hadoop-2.6.0-cdh5.8.0 -name *.jar|tr '\n' ':'`

SPARK_LOCAL_DIRS=/opt/spark/

HADOOP_HOME=/opt/hadoop-2.6.0-cdh5

 

4. Configure /opt/spark-1.6.0-cdh5.8.0/conf/slaves

master

slave1

slave2

5. Copy the entire directory to slave1, slave2

scp -r /opt/spark-1.6.0-cdh5.8.0 hadoop@slave1:/opt/

 

scp -r /opt/spark-1.6.0-cdh5.8.0 hadoop@slave2:/opt/

 

Modify spark-env.sh on slave1, slave2

SPARK_LOCAL_IP is the current machine name

 

5. Verify



 

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326393850&siteId=291194637