Build spark environment in Ubuntu

Note: 1. build environment: Ubuntu64 bit, Linux (there are Windows, I try not empty)

      2. General configuration jdk, Scala and other path environment is configured in / etc / profile, I built myself when I found myself in the profile configuration environment there will be problems, for example: I configured spark, source the profile, start spark is no problem, but when I want to start spark again,

       It will fail to start, need to source it again profile, so I need all environment configuration commands are written in the ~ / .bashrc file, just need to source it again just fine.

    3. I'm just a rookie, if something is wrong, we must say to me, I changed. Or you have a better way, taught me the trouble, thank you!

A set SSH Free secret sign :

(Usually Ubuntu comes with the client, so only need to install the server through the command:. Dpkg -l | grep ssh installed to see if the server and client, install the client command: sudo apt-get install opendssh- client)

1 , server installation, installation command: sudo APT-GET-install OpenSSH Server , users may need to enter a password, enter press Enter to continue. It may also ask whether to continue, type Y Enter to continue. After a successful installation, use the command ssh localhost test installation of SSH whether the normal connection, the connection will need to enter the user's password

2 , set up password-free logon steps:

( 1 ) generate keys, Command: SSH-keygen . The implementation process will ask you to enter a password to connect with, do not enter anything, just press Enter. The system will ~ / .ssh generate the key file directory (id_rsa private, id_rsa.pub public key ) , a successful result of execution shown in FIG.

( 2 ) The public key added to the contents generated authorized_keys file command: CAT ~ / .ssh / id_rsa.pub >> ~ / .ssh / authorized_keys

( 3 ) modify authorized_keys and ~ / .ssh file permissions, the command:

chmod  700  ~/.ssh

chmod  600  ~/.ssh/authorized_keys

(4) After the setup is successful, you can verify the success of the command: SSH localhost , if the user does not need to enter a password that is set successfully.

Two, the JDK installation and configuration

1, decompression JDK installation, extract command: the tar-8u181 -zvxf JDK-Linux-x64.tar.gz -C / opt

2, configure the environment variables in / etc / profile add the following files:

export JAVA_HOME=/opt/jdk1.8.0_181      # 配置JAVA_HOME

export CLASS_PATH = / opt / jdk1.8.0_181 / lib # Configure classpath

export PATH = $ PATH: $ JAVA_HOME / bin # added bin path to the PATH , the addition can be used directly from the command line java related commands     

3, re-execute just modified the / etc / profile file, the environment configuration take effect immediately, the command: Source / etc / profile

. 4 , JDK after installation configuration, enter the command on the command line: Java - Version , check JDK is installed correctly

Three, Scala installation and configuration

1 , decompression Scala installation, extract command: the tar-2.11.12.tgz -zvxf Scala -C / opt

2 , configure the environment variables in / etc / profile add the following files:

export SCALA_HOME = / opt / scala- 2.11.12 # Configure SCALA_HOME

export PATH = $ PATH: $ SCALA_HOME / bin # added bin directory to the PATH

3 , re-execute just modified the / etc / profile file, the environment configuration take effect immediately, the command: Source / etc / profile

. 4, Scala after installation configuration, the command at the command line: Scala , check Scala is installed correctly.

Four, the Spark installation and configuration

1 , decompression Spark installation, extract command: the tar -zvxf Spark-2.3.3-bin-hadoop2.7.tgz -C / opt

2 , configure the environment variables in / etc / profile add the following files:

export SPARK_HOME=/opt/spark-2.3.3-bin-hadoop2.7        # 配置SPARK_HOME

export PATH = $ PATH: $ SPARK_HOME / bin # added bin directory to the PATH

3 , re-execute just modified the / etc / profile file, the environment configuration take effect immediately, the command: Source / etc / profile

4 , modify Spark profile

( 1 ) copy the template file, the Spark 's conf directory of the Spark-env.sh.template , log4j.properties.template , slaves.template three files are copied to spark-env.sh , log4j.properties , slaves to the same directory (conf folder ) , pay attention to .template removed, and then Spark will be on file in the startup configuration items to read, or can not find the configuration. Command is as follows:

cd /opt/spark-2.3.3-bin-hadoop2.7/conf // into the configuration folder

sudo cp spark-env.sh.template spark-env.sh // Spark environment-related

sudo cp log4j.properties.template log4j.properties // Spark logs related

sudo cp slaves.template slaves // Spark cluster node

5 , set the spark-2.3.3-bin-hadoop2.7 folder read-write executables permission, the command:

cd /opt   

sudo chmod 777 spark-2.3.3-bin-hadoop2.7/

6 , Spark has completed the installation configuration, enter the Spark directory sbin path, run ./ start-all.sh start the cluster, just to test the installation of Spark .
7 , command line input spark-shell to check spark is installed correctly, the successful operation of the screen as shown: (spark default is to use the Scala language, written in python if you want to look at the first nine steps.)

 

8, Spark cluster after startup is complete, open your browser and enter the address: localhost: 8080 , you can see a cluster of UI interface, UI interface does not appear, you go back and check before environment configuration if there are problems, spark whether the start-all.

9 , the command line can be started directly pyspark , command pyspark , FIG screen after a successful start:

 

 

 More built environment can only let you write spark in command line mode, so I had some time free to re-teach you to build spark in pycharm years.

 

Guess you like

Origin www.cnblogs.com/chjie/p/10833873.html