The intended purpose
Based VMware workstation 10.0 + CentOS 7 + hadoop 3.2.0, build Hadoop cluster environment on a virtual machine, comprising a total of four nodes, wherein a master node, three slave nodes.
During operation
Steps to create a virtual machine, install the system
Advance installed VMware workstation 10, download 7 good CentOS image file on the computer. Concrete steps will not be repeated, we have talked about a few places during the installation of the system needs attention:
- Select minimize installation
- The default network is not off the Internet, set up a network installation open
- No need to install Vmware Tools
- Remember that the root password and create an account settings, passwords
If the network is not open to install, can be installed after the system is completed manually open:
# Edit NIC configuration file vi / etc / sysconfig / network- scripts / ifcfg-ens33
Add or modify a ONBOOT = "yes" in the configuration file, as the name implies, the NIC boot. Then restart the network services:
service network restart
Step two host through a host configured to log on to the virtual machine SecureCRT
Vmware virtual machine in the room, switching between virtual machines and the host too much trouble, here by means of SecureCRT log in to the virtual machine via SSH, this operation much more comfortable.
Configuration is very simple, on-line to find SecureCRT green version, installed on a host computer, create a new Session, SSH2 protocol default on the line, Hostname IP fill the virtual machine you want to connect, enter the account connection, a password to log on to the virtual machine .
Step Three Basic Configuration
- Creating a user operation hadoop content hadoop_user
# switch to the root user
SU
# Create a user
the useradd hadoop_user
# hadoop_user user to set a password
passwd hadoop_user - Install the JDK
2.1 copy, extract jdk
originally wanted by wget to download jdk, but unfortunately Oracle must now sign in jdk download, only to give up directly in the virtual machine.
Here selecting a host computer to download jdk good, and then sends the packet through SecureFX jdk compressed virtual machine.
# Switch to the root user su # created to put the jdk directory mkdir / opt / Software / jdk / # jdk to extract the directory just created tar zxvf /home/zhq/jdk-8u211-linux-x64.tar.gz -C / opt / Software / jdk / # Add the directory where the file to execute jdk environment variable vi / etc / profile
in the profile file add the following sentence at the end to save and exitexport JAVA_HOME=/opt/software/jdk/jdk1.8.0_211 export PATH=$PATH:$JAVA_HOME/bin
# The environment variable configuration to take effect Source / etc / Profile # JDK directory will be changed to your user hadoop_user, so that after use chown hadoop_user: hadoop_user / opt / software / jdk /
- Install net-tools
As we adopt is to minimize installation, there are some common tools and systems have not, here we manually install them.
su yum install -y net-tools
- Configuring Hostname and Hosts
Set Hostname to master
hostnamectl set-hostname master
Configuring Hosts (which slave1, slave2, salve3 three hosts currently do not exist, we press VMware virtual IP address incremented estimated these three rules of IP, Step Five were found after the IP discrepancies come back here accordingly Adjustment)
# Edit the hosts file su vi / etc / hosts
192.168.212.132 master 192.168.212.133 slave1 192.168.212.134 slave2 192.168.212.135 slave3
Step four Hadoop configuration
- Installation hadoop
1.1 Copy, hadoop decompressed
or compressed using hadoop SecureFX the master transmits in the virtual machine. After the transmission is completed to perform operations in a virtual machine:# Switch to the root user su # to create a place hadoop directory mkdir / opt / Software / hadoop / # jdk to extract the directory just created tar zxvf /home/zhq/hadoop-3.2.0.tar.gz -C / opt / software / hadoop / # Add the directory where the file to execute jdk environment variable vi / etc / profile
after adding the following statement at the end of the profile file save and exitexport HADOOP_HOME=/opt/software/hadoop/hadoop-3.2.0 export PATH=$PATH:$HADOOP_HOME/bin
# The environment variable configuration to take effect Source / etc / Profile # hadoop directory belongs to the user instead hadoop_user, so that after use chown hadoop_user: hadoop_user / opt / software / hadoop /
-
Hadoop environment configuration parameters
Note, master as namenode, some datanode only need to configure this item can not fit, but do not copy one by one to the back of the virtual machine cluster nodes modify, written on here redundancy, redundant portions tentatively italics logo.
2.1 JAVA_HOME disposed in the yarn_env.sh hadoop-env.sh and
add the following code to the end of the above two documents:export JAVA_HOME=/opt/software/jdk/jdk1.8.0_211
2.2 edit core-site.xml, add the following items arranged in the configuration file used to specify the cluster system hdfs, namenode is master.
<property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property>
2.3 edit hdfs-site.xml, add the following configuration items stored in the configuration to specify the number of copies hdfs 2, the relevant file is stored in the path namenode / hdfs_storage / name /, local file storage path is datanode / hdfs_storage / data / (ps: as used herein directory remember to manually create it in advance).
<property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/hdfs_storage/name/</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/hdfs_storage/data/</value> </property>
2.4 edit mapred-site.xml, add the following configuration items in the configuration used to specify the frame mapreduce job performed as yarn, job history access address to master: 10020, job history webapp address master: 19888.
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jophistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jophistory.webapp.address</name> <value>master:19888</value> </property>
2.5 edit yarn-site.xml, added in the configuration as follows yarn entry to specify the address of the resource manager master: 8032, scheduler resource manager is the address master: 8030, resource tracking resource manager master address : 8031, Explorer administrator address master: 8033, webapp access address resource manager for the master: 8088, mixed MapReduce programs used to wash services mapreduce_shuffle.
<property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
Step five cloned remaining hosts in the cluster
Here we use function clone VMware virtual machines, the cloned slave1, slave2, slave3 on the basis of the master.
Step Six configure ssh login-free secret
- Rsa to generate a secret key
Log on to hadoop_user to master, create the .ssh directory in / home / hadoop_user / and create a key pair:
cd / Home / hadoop_user / # create the .ssh directory mkdir .ssh cd .ssh # ssh create rsa encryption key pair ssh-keygen - t rsa after # all the way round, it creates a pair of keys file directory under .ssh id_rsa and id_rsa.pub.
- Copied to the public key slave1, and added to a trusted public key
in master operates as follows:
scp id_rsa.pub hadoop_user @ slave1: / Home / hadoop_user / # hadoop_user user input slave1 in the password, copy the success of
In slave1 operation as follows:
cd / Home / hadoop_user / mkdir . SSH # Add the master's public key to their list of trusted CAT id_rsa.pub >>. SSH / authorized_keys # adjust file permissions chmod 700 . SSH chmod 600 . SSH / authorized_keys
- Test whether you can avoid dense login
input in master SSH slave1 Note If no password is required to log in slave1 successful, otherwise check for missing steps. - 2 and 3 was repeated to effect the remaining logged on to the host-tight Free
Step seven verification environment build results
- HDFS formatted
before first use must be formatted once hdfs, before they start each daemon is successful. Formatted as follows:
bin / hdfs namenode -format
No error message if the format is successful.
- Start HDFS
execute the following command to start hdfs:
sbin/start-hdfs.sh
Stop command for the corresponding
sbin/stop-hdfs.sh
Start command is executed without error, then an executable jps command to view the java process, master will appear NameNode and SecondaryNameNode two processes, each slave node will appear DataNode process.
- Start YARN
execute the following command to start the yarn:sbin/start_yarn.sh
Stop command for the corresponding
sbin/stop-yarn.sh
Start command is executed without error, then an executable jps command to view the java process, master will appear ResourceManager process, each slave node will appear NodeManager process.
Step Eight firewall configuration
In the above configuration with several port number, default centos Firewall is turned on, here we need to configure the firewall to open these port numbers above, or will not visit the corresponding web page. Command is as follows:
firewall-cmd --zone=public --add-port=8030/tcp --permanent firewall-cmd --zone=public --add-port=8031/tcp --permanent firewall-cmd --zone=public --add-port=8032/tcp --permanent firewall-cmd --zone=public --add-port=8033/tcp --permanent firewall-cmd --zone=public --add-port=8088/tcp --permanent firewall-cmd --zone=public --add-port=10020/tcp --permanent firewall-cmd --zone=public --add-port=19888/tcp --permanent firewall-cmd --reload
After machining, the access master in the browser: 8088 Access to the cluster interface information on OK
to sum up
After more than eight steps, a fully distributed cluster hadoop up is completed! Finally, attach the configuration link hadoop official website https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html , English proficiency can also learn if the proposed Quguan network.