Big data cluster building experience

Basic environment preparation

  • 1 Turn off the firewall of each server

systemctl status firewalld.service #View firewall status
systemctl stop firewalld.service
#Turn off the firewall systemctl disable firewalld.service #Prohibit the firewall from booting

  • 2 Configure the hosts file

Edit the contents of the hosts file
vi /etc/hosts
configuration on the master node as follows:
172.19.241.* master
172.19.241.* slave2
172.19.241.* slave3
172.19.241.* slave1

  • 3 Set up password-free login

Choose a server as the master node, and then generate the public key
ssh-keygen -t rsa on the node, and
then send the public key to each slave node
ssh-copy-id slave1
. A password is required for the first time. After the setting is completed, the master node visits Each slave node no longer needs to enter a password

Master node installation

The following operations are all done on the master node

Install JDK

  • 1 JDK download
    https://www.oracle.com/technetwork/java/javase/downloads
  • 2 Upload the downloaded JDK to the master node
  • 3 Unzip

Create a folder java
mkdir /usr/local/java under /usr/local
and then extract jdk to this folder
tar -zxvf jdk-8u231-linux-x64.tar.gz -C /usr/local/java

  • 4 Placement JAVA_HOME

vi /etc/bashrc
add the following at the end of the file:

export JAVA_HOME=/usr/local/java/jdk1.8.0_231
export JRE_HOME=${
    
    JAVA_HOME}/jre
export PATH=${
    
    JAVA_HOME}/bin:$PATH
  • 5 Verify
    source /etc/bashrc
    and enter java -version

Install Hadoop

  • 1 Download
    https://hadoop.apache.org/releases.html
  • 2 Upload and unzip

mkdir /usr/local/hadoop
tar -zxvf hadoop-2.10.1.tar.gz -C /usr/local/hadoop

  • 3 Configure environment variables

cat >> /etc/profile <<EOF
#Hadoop
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.10.0
export PATH=$PATH:$HADOOP_HOME/bin
EOF

  • 4 Inspection

source /etc/profile
hadoop version

Hadoop configuration file

The main configuration files required are core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, masters, slaves

  • 1 core configuration

vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/core-site.xml

Modify its content as:

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
</configuration>
  • 2 hdfs configuration

vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop/hdfs/data</value>
    </property>
</configuration>
  • 3 mapred configuration

Copy
cp /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/mapred-site.xml
and then Edit
vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/mapred-site.xml

<configuration>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
   <property>
      <name>mapred.job.tracker</name>
      <value>http://master:9001</value>
  </property>
</configuration>
  • 4 yarn configuration

vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
</configuration>
  • 5 master configuration
    Create a new master file

vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/masters

master
  • 6 slaves configuration

vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/slaves

slave1
slave2
slave3

Slave node configuration

  • 1 Distribute jdk to each slave node

scp jdk-8u231-linux-x64.tar.gz slave1:/usr/local

Then unzip it to /usr/local/java

  • 2 Distribute Hadoop to each slave node.
    First, package the configured Hadoop into a package

tar -zcvf hadoop.tar.gz /usr/local/hadoop

Then divide the packed package into each slave node

scp hadoop.tar.gz slave1:/usr/local

Unzip the package

tar -zxcf hadoop.tar.gz -C /usr/local

  • 3 Distribute several configuration files to each slave node

Distribute hosts file
scp /etc/hosts slave1:/etc/
distribute profile file
scp /etc/profile slave1:/etc/
distribute bashrc file
scp /etc/bashrc slave1:/etc/

Then check if the configuration takes effect

source /etc/profile
source /etc/bashrc
java -version
hadoop version

If there is no problem, it means that the configuration has been completed, and the following is the startup

Hadoop startup

Cluster startup, Operate on the master node:

  • 1 Format the namenode
    before starting the service for the first time, you need to perform word operations, and you don't need to perform it later.

hadoop purpose -format

  • 2 Start

cd /usr/local/hadoop/hadoop-2.10.0
sbin/start-all.sh

  • 3 Check
    Use the jps command to check whether the startup is successful. The
    master node has the Namenode and ResourceManager processes. The
    slave node has the Datanode and NodeManager processes.

  • 4 Visually view
    hdfs visit http://master:50070/
    yarn visit http://master:8088/

Guess you like

Origin blog.csdn.net/weixin_42541360/article/details/109673976