1. Environment
Reprint please go to the source: http://eksliang.iteye.com/blog/2223784
Prepare 3 virtual machines and install the Centos 64-bit operating system.
- 192.168.177.131 mast1.com mast1
- 192.168.177.132 mast2.com mast2
- 192.168.177.133 mast3.com mast3
Where mast1 acts as NameNade node, mast2, mast3 acts as DataNode node
2. Preparations before installation
- install jdk
- Create a new hadoop user on each machine and configure the ssh public key to log in automatically
This part of the work is omitted, configure ssh public key password automatic login reference: http://eksliang.iteye.com/blog/2187265
3. Start deployment
3.1. Download hadoop2.5.2
Download address : http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.5.2/
3.2, placement hadoop-2.5.2 / etc / hadoop
First configure the machine mast1. After configuration, copy the configuration environment to mast2 and mast3.
3.2.1、core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mast1:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> </configuration>
- io.file.buffer.size: the size of the buffer used when reading and writing files
3.2.2、hdfs-site.xml
<configuration> <property> <name>dfs.nameservices</name> <value>ns</value> </property> <property> <name>dfs.namenode.http-address</name> <value>mast1:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>mast1:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/workspace/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/hadoop/workspace/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
- dfs.namenode.secondary.http-address:SecondaryNameNode service address
- dfs.webhdfs.enabled : Enable WebHDFS (REST API) function on NN and DN
3.2.3、mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>mast1:50030</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>mast1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>mast1:19888</value> </property> </configuration>
- mapreduce.jobhistory.address : mapreduce's history service IPC port
- mapreduce.jobhistory.webapp.address : http port of mapreduce's history server
3.2.4、yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>mast1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>mast1:8031</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>mast1:8032</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>mast1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>mast1:8088</value> </property> </configuration>
3.2.5.slaves: Specify the file of the DataNode node
mast2 mast3
3.2.6. Modify JAVA_HOME
Add the JAVA_HOME configuration to the files hadoop-env.sh and yarn-env.sh respectively
#export JAVA_HOME=${JAVA_HOME} --原来 export JAVA_HOME=/usr/local/java/jdk1.7.0_67
Although the environment variable of JAVA_HOME is configured, when hadoop starts, it will prompt that it cannot be found. There is no way to specify the absolute path.
3.2.7. Configure the environment variables of hadoop, refer to my configuration
[hadoop@Mast1 hadoop]$ vim ~/.bash_profile export HADOOP_HOME="/home/hadoop/hadoop-2.5.2" export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
Reminder: The two environment variables HADOOP_COMMON_LIB_NATIVE_DIR and HADOOP_OPTS must be added after 2.5.0, otherwise a small error will be reported when starting the cluster
3.3. Copy the configuration to mast2, mast3
Reminder: The copying process is copied under the hadoop user
scp -r ~/.bash_profile hadoop@mast2:/home/hadoop/ scp -r ~/.bash_profile hadoop@mast3:/home/hadoop/ scp -r $HADOOP_HOME/etc/hadoop hadoop@mast2:/home/hadoop/hadoop-2.5.2/etc/ scp -r $HADOOP_HOME/etc/hadoop hadoop@mast3:/home/hadoop/hadoop-2.5.2/etc/
3.4, format the file system
bin / hdfs purpose -format
3.5, start, stop (hdfs file system) and yarn (resource manager)
#Start the HDFS distributed file system [hadoop@Mast1 hadoop-2.5.2]$ sbin/start-dfs.sh #Close the HDFS distributed file system [hadoop@Mast1 hadoop-2.5.2]$ sbin/stop-dfs.sh #Start YEAR Explorer [hadoop@Mast1 hadoop-2.5.2]$ sbin/start-yarn.sh #Stop YEAR Explorer [hadoop@Mast1 hadoop-2.5.2]$ sbin/stop-yarn.sh
3.6, JPS verification is started
#mast1(NameNode) execute jps above, you can see NameNode, ResourceManager [hadoop@Mast1 hadoop-2.5.2]$ jps 3428 NameNode 4057 ResourceManager 4307 Jps #Switch to mast2 or mast3 (DataNode) node to execute jps [hadoop@Mast2 ~]$ jps 2726 DataNode 3154 Jps 3012 NodeManager
3.7. Browser Authentication
http://mast1:50070/
http://mast1:8088/
http://mast2:50075/
Remarks:
- The official documents of hadoop2.5.2 are placed in the ~/hadoop-2.5.2\hadoop-2.5.2\share\doc\hadoop directory of the download package, and you can view core.xml, hdfs.xml, mapreduce.xml, year.xml All default configurations, and his various operations
- A well-written blog about hadoop's parameters in Chinese: http://segmentfault.com/a/1190000000709725#articleHeader2