Key steps to build a hadoop cluster--take three nodes as an example

Build a three-node hadoop cluster :

Require:

hostname

Remark

IP address

Function

hadoop01

Master

master node

192.168.211.134

NameNodeDataNodeResourceManagerNodeManager

hadoop02

Slave

secondary node

192.168.211.129

DataNode NodeManager SecondaryNameNode

hadoop03

Slave

secondary node

192.168.211.140 

DataNode NodeManager

All machines need to be configured

1.JDK 2.SSH login-free       3.Hadoop cluster

 

Three machines run at the same time:
Step 1:
Add a user, set the password
useradd hadoop
passwd hadoop

Step 2:
Set permissions for the user
su root
visudo
In the following location of the file, add a line for hadoop to
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL #If
user hadoop does not need to enter a password when using the sudo command, then Is to enter the following sentence:
hadoop ALL=(ALL) NOPASSWD:ALL

Step 3:
Synchronize time: set
sudo date -s "00:00:00" on the three hosts at the same time

Step 4:
Set the host name:
sudo vi /etc/sysconfig/network
Each setting
=hadoop01
=hadoop02
=hadoop03

Step 5:
Configure intranet domain name mapping (write the mapping of three hosts under each host) sudo vi /etc/hosts
hadoop01
hadoop02
hadoop03

Step 6:
Configure firewall
service iptables start
service iptables status
service iptables stop
chkconfig iptables --list
chkconfig iptables of
service iptables save

Do the following on one host:

Step 7:
Install jdk,
upload jdk and Hadoop packages to hadoop and
decompress the jdk package: /home/hadoop/jdk1.8.0_101
tar -zxvf jdk-8u101-linux-x64.tar.gz
Modify the configuration file sudo vi /etc/profile
export JAVA_HOME =/home/hadoop/jdk1.8.0_101
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:
PATH=$PATH:$JAVA_HOME/bin:
refresh config file: source /etc /profile
which java View jdk/home/hadoop/jdk1.8.0_101/bin/java used by the current system

Step 8 :
Unzip the Hadoop package
tar -zxvf hadoop-2.6.1.tar.gz
Modify the configuration file (including java, hadoop) sudo vi /etc/profile

export JAVA_HOME=/home/hadoop/jdk1.8.0_101
export HADOOP_HOME=/home/hadoop/hadoop-2.6.1
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Refresh the configuration file: source /etc/profile

Step 9: Set up login
-free password-free login function: used to transfer between namenode and datanode.

(In the root directory (~) ssh (press the table key to display the ssh file, including the ssh-key plus key)
cd /home/hadoop/.ssh can enter the .ssh folder
, and ls can see the know_hosts file in the .ssh path)

Configure encryption command:
cd ~ ssh-keygen -t rsa in the root directory (~),
press Enter, and then press Enter all the way (it seems to be 3 times in total)

Then in the /home/hadoop/.ssh/ directory
cd /home/hadoop/.ssh/
ls
shows that two files, id_rsa and id_rsa.pub, have been added

In the /home/hadoop/.ssh/ directory, ssh- (press the table key)
shows the ssh-copy-id file

Execute
ssh-copy-id hadoop@hadoop01 under /home/hadoop/.ssh/ and then enter the password, press Enter
ssh-copy-id hadoop@hadoop02 and then enter the password, press Enter
ssh-copy-id hadoop@hadoop03 Enter the password again and press Enter

In the ~root directory, scp ./myfile hadoop@hadoop02:/home/hadoop/ press Enter to execute, it can be sent remotely
in the ~root directory, and ssh hadoop@hadoop03 press Enter to enter the third host

Delete the doc folder in the /hadoop-2.6.1/share folder (not used)

Step 10:
Note: Send the jdk installation package to other nodes:
scp -r jdk1.8.0_101 hadoop@hadoop02:/home/hadoop/
scp -r jdk1.8.0_101 hadoop@hadoop03:/home/hadoop/
can also send the environment Variables past:
sudo scp /etc/profile root@hadoop02:/etc/
sudo scp /etc/profile root@hadoop03:/etc/
Refresh system environment variables Profile: source /etc/profile

Step 11:
Configure jdk for the Hadoop framework (function: start the cluster)
vi /opt/hadoop/etc/hadoop/hadoop-env.sh Change
export JAVA_HOME=${JAVA_HOME} to
export JAVA_HOME=/home/hadoop/jdk1. 8.0_101


Step 12:
Add the hadoopdata folder under the /home/hadoop/hadoop-2.6.1/ path
cd /home/hadoop/hadoop-2.6.1/
mkdir hadoopdata

The manually created hadoopdata folder is used to store data and metadata.
Note: 1. It should be placed in /home/hadoop/hadoop-2.6.1/, not in the root directory, and there is no permission to place it in the root directory operate.
2. To be created before sending environment variables and configured hadoop folder.


Step 13: Modify the configuration file
Modify the configuration file
Modify the core-site.xml file
<property>
<name>fs.defaultFS</name>
<!--Configure the address of the hdfs system-->
<value>hdfs://hadoop01 :8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<property>
<name>hadoop.tmp.dir </name>
<value>/home/hadoop/hadoop-2.6.1/hadoopdata/tmp</value>
</property>

Modify the hdfs-site.xml file
<property>
<name>dfs.replication</name>
<!--Number of copies 3-->
<value>3</value>
</property>
<property>
<!--hadoop2 .x default data block size is 128M-->
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<!--The location where the NameNode node stores metadata-->
<value>file:///home/hadoop/hadoop-2.6.1/hadoopdata/dfs/name</value>
</property>
<property>
<name >dfs.datanode.data.dir</name>
<!--The location where the DataNode node stores data blocks-->
<value>file:///home/hadoop/hadoop-2.6.1/hadoopdata/dfs/data</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///home/hadoop/hadoop-2.6.1/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///home/hadoop/hadoop-2.6.1/hadoopdata/checkpoint/dfs/cname</value>
</property>
<property>
<name>dfs.http.address</name>
<value>hadoop01:50070</value>
</property>
<property>
<!--hdfs系统的web地址-->(主机2为主机1做辅助作用)
<name>dfs.secondary.http.address</name>
<value>hadoop02:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

The command to modify the mapred-site.xml file
is as follows:
# mv mapred-site.xml.template mapred-site.xml
#vi mapred-site.xml

<property>
<!--Configure the use of yarn resource scheduler when executing the calculation model-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</ property>
<property>
<!--Configure the address of the history service of the MapReduce framework-->
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<property>
< name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>

修改yarn-site.xml
<property>
<!--配置resourcemanager服务的地址-->
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop01:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
</property>

Step 14: Configure the datanode, datamanager node
#vi slaves
Write the machine names of all machines that need to be configured as slaves in it.
Note: Each machine name occupies one line.
hadoop01
hadoop02
hadoop03

Step 15: (The master file stores the list of secondarynamenodes) (the master file does not have it, it needs to be created manually) Note that #vi /etc/hadoop/master # vi ./master hadoop02
should be created in the hadoop directory


Step 16:
//Send the configured Hadoop folder to slave1 and slave2. Send in ~ root directory, that is, remote copy
#scp -r hadoop-2.6.1 hadoop@hadoop02:/home/hadoop/
#scp -r hadoop-2.6.1 hadoop@hadoop03:/home/hadoop/

Step 17:
Configure the environment variables of the Hadoop framework: ---- adjust
export HADOOP_HOME=/home/hadoop/hadoop-2.6.1 (paste directly)
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin ( Configure bin, because the execution script is in bin. Add: $HADOOP_HOME/bin:$HADOOP_HOME/sbin directly
after PATH=$PATH:$JAVA_HOME/bin) Function: After configuration, it can be used in any path Scripts in the /sbin directory.
# source /etc/profile

Step 18:
Send hadoop environment variables to other nodes:
scp /etc/profile root@hadoop02:/etc/
scp /etc/profile root@hadoop03:/etc/

Step 19;
Testable: If a file starting with st appears in the st+table key under the root directory ~, the configuration is successful.

Step 20:
//Start the Hadoop cluster (because the name and data directories are only created after formatting) and format
in the /home/hadoop/hadoop-2.6.1/hadoopdata folder
//Format before starting, only It needs to be done once (on the NameNode node)
#hadoop namenode -format

Step 21: Start the hdfs cluster and yarn cluster
in the root directory~, execute start-dfs.sh to start the hdfs cluster first. After the execution is complete, the startup namenode and datanode are displayed.
Two folders, data and name, are automatically created under the /dfs folder.

In the ~ root directory, execute start-yarn.sh, start the yarn cluster, and display resourcemanager and nodemanager after startup
(log shows that the first resourcemanager is started)
step 22;
jps verification: display and start each function

Step 23: The
cluster is successfully started, upload the file test:
under the root directory ~, vi aaa, just write something, save and exit
the root directory and execute, hadoop fs -p ./aaa / (Note: ./aaa means The current path, followed by / represents the root path of the hdfs system

 

 Building a cluster mainly lies in the configuration file, the configuration file description link:

 Take five nodes as an example: link:

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325004446&siteId=291194637