Big data cluster building experience
Basic environment preparation
- 1 Turn off the firewall of each server
systemctl status firewalld.service #View firewall status
systemctl stop firewalld.service
#Turn off the firewall systemctl disable firewalld.service #Prohibit the firewall from booting
- 2 Configure the hosts file
Edit the contents of the hosts file
vi /etc/hosts
configuration on the master node as follows:
172.19.241.* master
172.19.241.* slave2
172.19.241.* slave3
172.19.241.* slave1
- 3 Set up password-free login
Choose a server as the master node, and then generate the public key
ssh-keygen -t rsa on the node, and
then send the public key to each slave node
ssh-copy-id slave1
. A password is required for the first time. After the setting is completed, the master node visits Each slave node no longer needs to enter a password
Master node installation
The following operations are all done on the master node
Install JDK
- 1 JDK download
https://www.oracle.com/technetwork/java/javase/downloads - 2 Upload the downloaded JDK to the master node
- 3 Unzip
Create a folder java
mkdir /usr/local/java under /usr/local
and then extract jdk to this folder
tar -zxvf jdk-8u231-linux-x64.tar.gz -C /usr/local/java
- 4 Placement JAVA_HOME
vi /etc/bashrc
add the following at the end of the file:
export JAVA_HOME=/usr/local/java/jdk1.8.0_231
export JRE_HOME=${
JAVA_HOME}/jre
export PATH=${
JAVA_HOME}/bin:$PATH
- 5 Verify
source /etc/bashrc
and enter java -version
Install Hadoop
- 1 Download
https://hadoop.apache.org/releases.html - 2 Upload and unzip
mkdir /usr/local/hadoop
tar -zxvf hadoop-2.10.1.tar.gz -C /usr/local/hadoop
- 3 Configure environment variables
cat >> /etc/profile <<EOF
#Hadoop
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.10.0
export PATH=$PATH:$HADOOP_HOME/bin
EOF
- 4 Inspection
source /etc/profile
hadoop version
Hadoop configuration file
The main configuration files required are core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, masters, slaves
- 1 core configuration
vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/core-site.xml
Modify its content as:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
- 2 hdfs configuration
vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property>
</configuration>
- 3 mapred configuration
Copy
cp /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/mapred-site.xml
and then Edit
vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://master:9001</value>
</property>
</configuration>
- 4 yarn configuration
vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
- 5 master configuration
Create a new master file
vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/masters
master
- 6 slaves configuration
vi /usr/local/hadoop/hadoop-2.10.0/etc/hadoop/slaves
slave1
slave2
slave3
Slave node configuration
- 1 Distribute jdk to each slave node
scp jdk-8u231-linux-x64.tar.gz slave1:/usr/local
Then unzip it to /usr/local/java
- 2 Distribute Hadoop to each slave node.
First, package the configured Hadoop into a package
tar -zcvf hadoop.tar.gz /usr/local/hadoop
Then divide the packed package into each slave node
scp hadoop.tar.gz slave1:/usr/local
Unzip the package
tar -zxcf hadoop.tar.gz -C /usr/local
- 3 Distribute several configuration files to each slave node
Distribute hosts file
scp /etc/hosts slave1:/etc/
distribute profile file
scp /etc/profile slave1:/etc/
distribute bashrc file
scp /etc/bashrc slave1:/etc/
Then check if the configuration takes effect
source /etc/profile
source /etc/bashrc
java -version
hadoop version
If there is no problem, it means that the configuration has been completed, and the following is the startup
Hadoop startup
Cluster startup, Operate on the master node:
- 1 Format the namenode
before starting the service for the first time, you need to perform word operations, and you don't need to perform it later.
hadoop purpose -format
- 2 Start
cd /usr/local/hadoop/hadoop-2.10.0
sbin/start-all.sh
-
3 Check
Use the jps command to check whether the startup is successful. The
master node has the Namenode and ResourceManager processes. The
slave node has the Datanode and NodeManager processes. -
4 Visually view
hdfs visit http://master:50070/
yarn visit http://master:8088/