Installation Record spark-2.4.5

 Reference  https://data-flair.training/blogs/install-apache-spark-multi-node-cluster/

Download spark  address 

 

 http://spark.apache.org/downloads.html

Prepare three nodes

192.168.1.1 [hostname] master
192.168.1.2 [hostname] slave1
192.168.1.3 [hostname] slave2

The above configuration to the three nodes append machine   / etc / hosts   in. Since I am here three machines domain are different, so we set the [hostname], for example, master node

192.168.1.1 xxx.localdomain master

Check the host name for the method,

$ hostname

If the last start spark error unknown hostname, generally refers to the host name is not set, this time by 

$ hostname -i

Found will be reported the same mistakes.

 

installation steps:

First, set the ssh login-free secret

If you do not install ssh, you need to install

sudo apt install openssh-server

On three machines were executed

ssh-keygen -t rsa

The way to enter, use the default setting (key file path and file name)

The above slave1 slave2   ~ / .ssh / id_rsa.pub  copy files to the master node,

scp ~/.ssh/id_rsa.pub xxx@master:~/.ssh/id_rsa.pub.slave1
scp ~/.ssh/id_rsa.pub xxx@master:~/.ssh/id_rsa.pub.slave2

Note, xxx represents the user name, the best three machines use the same user name, if required, the user can create

adduser xxx # create a new user xxx 
passwd xxx to xxx password #

 

Performed on the master

cat ~/.ssh/id_rsa.pub* >> ~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys xxx@slave1:~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys xxx@slave2:~/.ssh/authorized_keys

Verify no login password on the master

ssh slave1
ssh slave2

On slave1 / slave2 can also password-free two other nodes.

Note: .ssh folder permissions must be 700, authorized_keys file permissions must be 600 (additional permissions values ​​may not work), modify the permissions to use

chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

 

Second, install jdk, scala, spark

Omitted, spark installed just above the downloaded file to decompress. Note configuration environment variable 

export JAVA_HOME=...
export SCALA_HOME=...
export SPARK_HOME=...
export PATH=$JAVA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH

On the master node, enter the conf directory under SPARK_HOME,

cd conf
cp spark-env.sh.template spark-env.sh
cp slaves.template slaves

Edit slaves file

# localhost
slave1
slave2

Edit spark-env.sh file

export JAVA_HOME=...
export SPARK_WORKER_CORES=8

On slave1 and slave2, perform the same operations.

Note: spark directory the best remains the same in the three nodes that same environment variables SPARK_HOME

Third, start the cluster

Executed on the master node

sbin/start-all.sh

Shut down the cluster is executed

sbin/stop-all.sh

After the start, can be performed on the master or slave1 / slave2   JPS   to see the java process. View web interface, address

http://MASTER-IP:8080/

 If the connection is not found worker nodes master, given as follows

Caused by: java.io.IOException: Connecting to :7077 timed out (120000 ms)
...
org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run
...

We need three machines in the $ SPARK_HOME / conf / spark-env.sh add

export SPARK_MASTER_HOST=<master ip>

Then re-run 

sbin/start-all.sh

 

Guess you like

Origin www.cnblogs.com/sjjsxl/p/12606062.html