Tutorial to build a three-node cluster hadoop
First, install VMware Virtual Machine
Second, the creation of a Linux virtual machine nodes, centOS7.6 64-bit version of the machine used
Third, create two clones nodes
Fourth, turn off the firewall node 3
Command: systemctl stop firewalld
After closing Check firewall status to confirm whether closed successfully: systemctl status firewalld
Fifth, turn off selinux
vi etc/selinux/config
SELINUX=disabled
Sixth, configure the network settings
vi /etc/sysconfig/network-scripts/ifcfg-ens33
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.XX.XX
NETMASK=255.255.255.0
GATEWAY=192.168.XX.2
DNS1 = 8.8.8.8
Configured, you need to restart the network service
service network restart
Seven, three virtual machine host name change
hostnamectl set-hostname node01
hostnamectl set-hostname node02
hostnamectl set-hostname node03
And change the hosts file to configure the host name and ip address mapping relations
vi etc/hosts
Take effect after restart the virtual machine
Eight, three machine configuration time synchronization, ntp server is selected here aliyun
yum -y install ntpdate
crontab -e
*/1 * * * * /usr/sbin/ntpdate time1.aliyun.com
Nine, add special hadoop user, and give sudo privileges
useradd hadoop
passwd XXXXXX
visudo
Add profile
Hadoop ALL=(ALL) ALL
Nine, create a dedicated directory for the application hadoop
mkdir -p /hadoop/soft
mkdir -p /hadoop/install
Change the folder owner for the hadoop user
chown -R hadoop:hadoop /hadoop
Ten, three machines installed jdk
Switch to hadoop user, and extract the installation JDK, as used herein, version 1.8
cd /hadoop/soft/
Hadoop configuration of the user's environment variables
cd /home/hadoop
vi .bash_profile
Once configured, reload the configuration file: source .bash_profile
Verify the configuration: java -version
XI, configure hadoop-free user login close
Three machines execute the command
ssh-keygen -t rsa
Generate a public key and a private key
Copy the public key to Node 1
ssh-copy-id node01
And then copy the files to the public from node01 two other nodes
cd /home/hadoop/.ssh/
scp authorized_keys node02:$PWD
scp authorized_keys node03:$PWD
Authentication Configuration
Three virtual machines execute ssh command to connect to each other to other machines, such as the connection fails, you can delete the files in the .ssh hadoop user directory folder repeat this step.
twelve,
Extract installation hadoop, this machine uses CDH release hadoop
Configuration environment variable, is still here in the hadoop user configuration
Once configured, reload the environment variables: source .bash_profile
Verify the configuration: java -version
hadoop version
The normal output version information, i.e., the configuration is successful verification
thirteen,
The following configuration is hadoop configuration files, we recommend using a remote connection tools Log in to edit the virtual machine
1, the configuration file hadoop-env.sh
cd /hadoop/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop
we hadoop-env.sh
Just import jdk installation directory in this file
export JAVA_HOME=/hadoop/install/jdk1.8.0_141
2, placed core-site.xml
cd /hadoop/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop
I heart-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas</value>
</property>
<! - buffer size, dynamically adjusted based on the actual work server performance ->
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
The number of minutes after being deleted <description> checkpoint. If zero, the trash will be disabled.
This option can be configured on the server and the client. If the trash is disabled on the server side, check the client configuration.
If you enable the trash on the server side, the value configured on the server will be used, and ignore the client configuration values. </ Description>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
<Description> minutes between checkpoints garbage. It should be less than or equal to fs.trash.interval.
If zero, the value is set to the value of fs.trash.interval. Checking each movement of the hands,
It will create a new checkpoint from the current, and remove the checkpoint earlier than fs.trash.interval created. </ Description>
</property>
</configuration>
3, the configuration hdfs-site.xml
<configuration>
<! - NameNode path store metadata information, the actual work, first determine the general directory of the disk mounted and a plurality of catalogs, divided ->
<! - the dynamic cluster offline
<property>
<name>dfs.hosts</name>
<value>/hadoop/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop/accept_host</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/hadoop/install/hadoop-2.6.0-cdh5.14.2/etc/hadoop/deny_host</value>
</property>
-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>node01:50070</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas</value>
</property>
<! - the node positions defined dataNode data stored in the actual work, first determine the general directory of the disk mounted and a plurality of catalogs, divided ->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas</value>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:///hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
</configuration>
4, arranged mapred-site.xml
vi mapred-site.xml
<! - Specifies the operational environment is mapreduce yarn ->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node01:19888</value>
</property>
</configuration>
5, the configuration yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://node01:19888/jobhistory/logs</value>
</property>
<! - how long delete time log aggregation here ->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>2592000</value><!--30 day-->
</property>
<-! Time to keep the user logs in seconds. Applies only if the log aggregation is disabled ->
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>604800</value><!--7 day-->
</property>
<! - Specifies the type of compression used to compress the file summary log ->
<property>
<name>yarn.nodemanager.log-aggregation.compression-type</name>
<value>gz</value>
</property>
<-! Nodemanager local file storage directory ->
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/yarn/local</value>
</property>
<-! ResourceManager save the greatest number of completed tasks ->
<property>
<name>yarn.resourcemanager.max-completed-applications</name>
<value>1000</value>
</property>
</configuration>
6, create a file storage directory
[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/tempDatas
[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas
[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/datanodeDatas
[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits
[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/snn/name
[root@node01 ~]# mkdir -p /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/snn/edits
XIV format hadoop (This step is started before the cluster hadoop performed in NameNode (master node))
hdfs namenode -format
The following is a portion of the log, by reference
08/19/23 04:32:34 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = hadoop
STARTUP_MSG: host = node01.kaikeba.com/192.168.52.100
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0-cdh5.14.2
STARTUP_MSG: classpath = /hadoop/install/hadoop-2.6.0-19/08/23 04:32:35 INFO common.Storage: Storage directory /hadoop/install/hadoop-2.6.0-
# Display formats successfully. . .
cdh5.14.2/hadoopDatas/namenodeDatas has been successfully formatted.
19/08/23 04:32:35 INFO common.Storage: Storage directory /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/dfs/nn/edits has been successfully formatted.
19/08/23 04:32:35 INFO namenode.FSImageFormatProtobuf: Saving image file /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas/current/fsimage.ckpt_0000000000000000000 using no compression
19/08/23 04:32:35 INFO namenode.FSImageFormatProtobuf: Image file /hadoop/install/hadoop-2.6.0-cdh5.14.2/hadoopDatas/namenodeDatas/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
19/08/23 04:32:35 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
19/08/23 04:32:35 INFO util.ExitUtil: Exiting with status 0
08/19/23 04:32:35 INFO namenode.NameNode: SHUTDOWN_MSG:
# Omitted part of the log
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node01.kaikeba.com/192.168.52.100
************************************************************/
Fifth, start the cluster
start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
19/08/23 05:18:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [node01]
node01: starting namenode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-namenode-node01.kaikeba.com.out
node01: starting datanode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-datanode-node01.kaikeba.com.out
node03: starting datanode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-datanode-node03.kaikeba.com.out
node02: starting datanode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-datanode-node02.kaikeba.com.out
Starting secondary namenodes [node01]
node01: starting secondarynamenode, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/hadoop-hadoop-secondarynamenode-node01.kaikeba.com.out
19/08/23 05:18:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/yarn-hadoop-resourcemanager-node01.kaikeba.com.out
node03: starting nodemanager, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/yarn-hadoop-nodemanager-node03.kaikeba.com.out
node02: starting nodemanager, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/yarn-hadoop-nodemanager-node02.kaikeba.com.out
node01: starting nodemanager, logging to /hadoop/install/hadoop-2.6.0-cdh5.14.2/logs/yarn-hadoop-nodemanager-node01.kaikeba.com.out
[hadoop@node01 ~]$
In the browser address bar enter http://192.168.52.100:50070/dfshealth.html#tab-overview view namenode web interface
XVI run the program mapreduce
1, hdfs dfs -ls / command hdfs browse the file system
HDFS dfs -Ls
Since the cluster has just set up, then no directory display
2. Create a test directory
hdfs dfs -mkdir /test
And then browse the directory, you can see the newly created directory
HDFS dfs -Ls / test
19/08/23 05:22:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2019-08-23 05:21 /test/
3, using the touch command to create a file in linux local words
touch words
vi words
sadfasdfasdfas2rzxcvzr3r23
sadfasdfhszcxvhh8
4, the local words file created uploaded to the test directory of hdfs
hdfs dfs -put words /test
Check whether the file uploaded successfully
hdfs dfs -ls -r /test
Execute the command, the number of words statistics / test / words file, and outputs to test / output file, output file can not already exist, otherwise it will error
hadoop jar /hadoop/install/hadoop-2.6.0-cdh5.14.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /test/words /test/output
XVII close cluster
stop-all.sh