Hadoop simple three-node cluster to build summary

Tutorial to build a three-node cluster hadoop

First, install VMware Virtual Machine

Second, the creation of a Linux virtual machine nodes, centOS7.6 64-bit version of the machine used

Third, create two clones nodes

Fourth, turn off the firewall node 3

Command: systemctl stop firewalld

After closing Check firewall status to confirm whether closed successfully: systemctl status firewalld

Fifth, turn off selinux

vi etc/selinux/config

       SELINUX=disabled

Sixth, configure the network settings

vi /etc/sysconfig/network-scripts/ifcfg-ens33

 

BOOTPROTO=static

ONBOOT=yes

IPADDR=192.168.XX.XX

NETMASK=255.255.255.0

GATEWAY=192.168.XX.2

DNS1 = 8.8.8.8

Configured, you need to restart the network service

service network restart

Seven, three virtual machine host name change

hostnamectl set-hostname node01

hostnamectl set-hostname node02

hostnamectl set-hostname node03

And change the hosts file to configure the host name and ip address mapping relations

vi etc/hosts

Take effect after restart the virtual machine

Eight, three machine configuration time synchronization, ntp server is selected here aliyun

yum -y install ntpdate

crontab -e

*/1 * * * * /usr/sbin/ntpdate time1.aliyun.com

Nine, add special hadoop user, and give sudo privileges

useradd hadoop

passwd XXXXXX

visudo

Add profile

Hadoop ALL=(ALL)      ALL  

Nine, create a dedicated directory for the application hadoop

mkdir -p  /hadoop/soft

mkdir -p  /hadoop/install

Change the folder owner for the hadoop user

chown -R hadoop:hadoop /hadoop

Ten, three machines installed jdk

Switch to hadoop user, and extract the installation JDK, as used herein, version 1.8

cd /hadoop/soft/

Hadoop configuration of the user's environment variables

cd /home/hadoop

vi .bash_profile

Once configured, reload the configuration file: source .bash_profile

Verify the configuration: java -version

XI, configure hadoop-free user login close

Three machines execute the command

ssh-keygen -t rsa

Generate a public key and a private key

Copy the public key to Node 1

ssh-copy-id node01

And then copy the files to the public from node01 two other nodes

cd /home/hadoop/.ssh/

scp authorized_keys node02:$PWD

scp authorized_keys node03:$PWD

Authentication Configuration

Three virtual machines execute ssh command to connect to each other to other machines, such as the connection fails, you can delete the files in the .ssh hadoop user directory folder repeat this step.

twelve,

Extract installation hadoop, this machine uses CDH release hadoop

Configuration environment variable, is still here in the hadoop user configuration

Once configured, reload the environment variables: source .bash_profile

Verify the configuration: java -version

hadoop version

The normal output version information, i.e., the configuration is successful verification

thirteen,

The following configuration is hadoop configuration files, we recommend using a remote connection tools Log in to edit the virtual machine

1, the configuration file hadoop-env.sh

cd /hadoop/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop

we hadoop-env.sh

Just import jdk installation directory in this file

export JAVA_HOME=/hadoop/install/jdk1.8.0_141

2, placed core-site.xml

cd /hadoop/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop

I heart-site.xml

 

<configuration>

       <property>

              <name>fs.defaultFS</name>

              <value>hdfs://node01:8020</value>

       </property>

       <property>

              <name>hadoop.tmp.dir</name>

            <value>/hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas</value>

       </property>

       <! - buffer size, dynamically adjusted based on the actual work server performance ->

       <property>

              <name>io.file.buffer.size</name>

              <value>4096</value>

       </property>

<property>

     <name>fs.trash.interval</name>

     <value>10080</value>

     The number of minutes after being deleted <description> checkpoint. If zero, the trash will be disabled.

     This option can be configured on the server and the client. If the trash is disabled on the server side, check the client configuration.

     If you enable the trash on the server side, the value configured on the server will be used, and ignore the client configuration values. </ Description>

</property>

 

<property>

     <name>fs.trash.checkpoint.interval</name>

     <value>0</value>

     <Description> minutes between checkpoints garbage. It should be less than or equal to fs.trash.interval.

     If zero, the value is set to the value of fs.trash.interval. Checking each movement of the hands,

     It will create a new checkpoint from the current, and remove the checkpoint earlier than fs.trash.interval created. </ Description>

</property>

</configuration>

3, the configuration hdfs-site.xml

<configuration>

       <! - NameNode path store metadata information, the actual work, first determine the general directory of the disk mounted and a plurality of catalogs, divided ->

       <! - the dynamic cluster offline

       <property>

              <name>dfs.hosts</name>

            <value>/hadoop/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop/accept_host</value>

       </property>

      

       <property>

              <name>dfs.hosts.exclude</name>

              <value>/hadoop/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop/deny_host</value>

       </property>

        -->

        

        <property>

                     <name>dfs.namenode.secondary.http-address</name>

                     <value>node01:50090</value>

       </property>

 

       <property>

              <name>dfs.namenode.http-address</name>

              <value>node01:50070</value>

       </property>

       <property>

              <name>dfs.namenode.name.dir</name>

              <value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas</value>

       </property>

       <! - the node positions defined dataNode data stored in the actual work, first determine the general directory of the disk mounted and a plurality of catalogs, divided ->

       <property>

              <name>dfs.datanode.data.dir</name>

              <value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas</value>

       </property>

      

       <property>

              <name>dfs.namenode.edits.dir</name>

              <value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits</value>

       </property>

       <property>

              <name>dfs.namenode.checkpoint.dir</name>

              <value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name</value>

       </property>

       <property>

              <name>dfs.namenode.checkpoint.edits.dir</name>

              <value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits</value>

       </property>

       <property>

              <name>dfs.replication</name>

              <value>3</value>

       </property>

       <property>

              <name>dfs.permissions</name>

              <value>false</value>

       </property>

       <property>

              <name>dfs.blocksize</name>

              <value>134217728</value>

       </property>

</configuration>

4, arranged mapred-site.xml

vi mapred-site.xml

 

<! - Specifies the operational environment is mapreduce yarn ->

<configuration>

   <property>

              <name>mapreduce.framework.name</name>

              <value>yarn</value>

       </property>

 

       <property>

              <name>mapreduce.job.ubertask.enable</name>

              <value>true</value>

       </property>

      

       <property>

              <name>mapreduce.jobhistory.address</name>

              <value>node01:10020</value>

       </property>

 

       <property>

              <name>mapreduce.jobhistory.webapp.address</name>

              <value>node01:19888</value>

       </property>

</configuration>

5, the configuration yarn-site.xml

<configuration>

       <property>

              <name>yarn.resourcemanager.hostname</name>

              <value>node01</value>

       </property>

       <property>

              <name>yarn.nodemanager.aux-services</name>

              <value>mapreduce_shuffle</value>

       </property>

 

      

       <property>

              <name>yarn.log-aggregation-enable</name>

              <value>true</value>

       </property>

 

 

       <property>

               <name>yarn.log.server.url</name>

               <value>http://node01:19888/jobhistory/logs</value>

       </property>

 

       <! - how long delete time log aggregation here ->

       <property>

        <name>yarn.log-aggregation.retain-seconds</name>

        <value>2592000</value><!--30 day-->

       </property>

       <-! Time to keep the user logs in seconds. Applies only if the log aggregation is disabled ->

       <property>

        <name>yarn.nodemanager.log.retain-seconds</name>

        <value>604800</value><!--7 day-->

       </property>

       <! - Specifies the type of compression used to compress the file summary log ->

       <property>

        <name>yarn.nodemanager.log-aggregation.compression-type</name>

        <value>gz</value>

       </property>

       <-! Nodemanager local file storage directory ->

       <property>

        <name>yarn.nodemanager.local-dirs</name>

        <value>/hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/yarn/local</value>

       </property>

       <-! ResourceManager save the greatest number of completed tasks ->

       <property>

        <name>yarn.resourcemanager.max-completed-applications</name>

        <value>1000</value>

       </property>

 

</configuration>

6, create a file storage directory

[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas

[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas

[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas

[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits

[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name

[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits

 

XIV format hadoop (This step is started before the cluster hadoop performed in NameNode (master node))

hdfs namenode -format

The following is a portion of the log, by reference

08/19/23 04:32:34 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   user = hadoop

STARTUP_MSG:   host = node01.kaikeba.com/192.168.52.100

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 2.6.0-cdh5.14.2

STARTUP_MSG:   classpath = /hadoop/install/hadoop-2.6.0-19/08/23 04:32:35 INFO common.Storage: Storage directory /hadoop/install/hadoop-2.6.0-

# Display formats successfully. . .

cdh5.14.2/hadoopDatas/namenodeDatas has been successfully formatted.

19/08/23 04:32:35 INFO common.Storage: Storage directory /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits has been successfully formatted.

19/08/23 04:32:35 INFO namenode.FSImageFormatProtobuf: Saving image file /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas/current/fsimage.ckpt_0000000000000000000 using no compression

19/08/23 04:32:35 INFO namenode.FSImageFormatProtobuf: Image file /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.

19/08/23 04:32:35 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

19/08/23 04:32:35 INFO util.ExitUtil: Exiting with status 0

08/19/23 04:32:35 INFO namenode.NameNode: SHUTDOWN_MSG:

# Omitted part of the log

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at node01.kaikeba.com/192.168.52.100

************************************************************/

Fifth, start the cluster

start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

19/08/23 05:18:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [node01]

node01: starting namenode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-namenode-node01.kaikeba.com.out

node01: starting datanode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-datanode-node01.kaikeba.com.out

node03: starting datanode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-datanode-node03.kaikeba.com.out

node02: starting datanode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-datanode-node02.kaikeba.com.out

Starting secondary namenodes [node01]

node01: starting secondarynamenode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-secondarynamenode-node01.kaikeba.com.out

19/08/23 05:18:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

starting yarn daemons

starting resourcemanager, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/yarn-hadoop-resourcemanager-node01.kaikeba.com.out

node03: starting nodemanager, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/yarn-hadoop-nodemanager-node03.kaikeba.com.out

node02: starting nodemanager, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/yarn-hadoop-nodemanager-node02.kaikeba.com.out

node01: starting nodemanager, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/yarn-hadoop-nodemanager-node01.kaikeba.com.out

[hadoop@node01 ~]$

In the browser address bar enter http://192.168.52.100:50070/dfshealth.html#tab-overview view namenode web interface

XVI run the program mapreduce

1, hdfs dfs -ls / command hdfs browse the file system

HDFS dfs -Ls

Since the cluster has just set up, then no directory display

2. Create a test directory

hdfs dfs -mkdir /test

And then browse the directory, you can see the newly created directory

HDFS dfs -Ls / test

19/08/23 05:22:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

drwxr-xr-x   - hadoop supergroup          0 2019-08-23 05:21 /test/

3, using the touch command to create a file in linux local words

touch words

vi words

sadfasdfasdfas2rzxcvzr3r23

sadfasdfhszcxvhh8

4, the local words file created uploaded to the test directory of hdfs

hdfs dfs -put words /test

Check whether the file uploaded successfully

hdfs dfs -ls -r /test

Execute the command, the number of words statistics / test / words file, and outputs to test / output file, output file can not already exist, otherwise it will error

hadoop jar /hadoop/install/hadoop-2.6.0-cdh5.14.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /test/words /test/output

XVII close cluster

stop-all.sh

 

Guess you like

Origin www.cnblogs.com/zf-mylover/p/11622020.html