table of Contents
The first step: hadoop package download
Step two: Upload apache hadoop pack and unpack
The third step: modify the configuration file
Step Four: Configure hadoop environment variables
Step six: three ports viewing interface
The company set up a cluster of large data summary, it bought three Ali cloud service to build large data cluster structures
The first step: hadoop package download
Download link http://archive.apache.org/dist/hadoop/core/ download version 2.7.5
Step two: Upload apache hadoop pack and unpack
Extracting command
CD / Export / Softwares
the tar-2.7.5.tar.gz -C ../servers/ -zxvf Hadoop
The third step: modify the configuration file
1 modified core-site.xml
Modified core-site.xml
first machine to perform a command
CD / Export / Softwares
CD /export/servers/hadoop-2.7.5/etc/hadoop
vim core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://node01:8020</value>
<!-- 是hdfs端口,用于远程连接 -->
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/export/servers/hadoop-2.7.5/hadoopDatas/tempDatas</value>
<!-- hadoop.tmp.dir 临时文件 服务端参数,修改需重启 -->
</property>
<!-- 缓冲区大小,实际工作中根据服务器性能动态调整 -->
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
<!-- 在序列文件中使用的读/写缓冲区的大小。单位kb -->
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
<!-- 开启hdfs的垃圾桶机制,删除掉的数据可以从垃圾桶中回收,单位分钟 -->
</property>
</configuration>
2 modified hdfs-site.xml
The first machine to perform a command
CD /export/servers/hadoop-2.7.5/etc/hadoop
Vim-HDFS the site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node01:50090</value>
<!-- node1:50090 SecondaryNameNode地址和端口-->
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>node01:50070</value>
<!--node01:50070 namenode端口地址 -->
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas,file:///export/servers/hadoop-
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas,file:///export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas2</value>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>file:///export/servers/hadoop-2.7.5/hadoopDatas/nn/edits</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///export/servers/hadoop-2.7.5/hadoopDatas/snn/name</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:///export/servers/hadoop-2.7.5/hadoopDatas/dfs/snn/edits</value>
<!-- 建议不使用SNN功能,忽略此配置 -->
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<!-- 数据块副本数。此值可以在创建文件时设定,客户端可以只有设定,也可以在命令行修改。不同文件可以有不同的副本数。默认值用于未指定时。 -->
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<!-- 是否在HDFS中开启权限检查. -->
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
</configuration>
3 Modify hadoop-env.sh
The first machine execute the following command
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_161
4 modified mapred-site.xml
The first machine execute the following command
cd /export/servers/hadoop-2.7.5/etc/hadoop
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>server02:10020</value>
<!-- MapReduce JobHistory服务器IPC主机:端口 -->
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>server02:19888</value>
<!-- MapReduce JobHistory Server Web UI主机:端口-->
</property>
</configuration>
5 modify yarn-site.xml
The first machine execute the following command
/export/servers/hadoop-2.7.5/etc/hadoop cd
vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>server02</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>127.0.0.1:8050</value>
<!-- 资源调度器对应的端口 可不用-->
</property>
</configuration>
Amendment 6 mapred-env.sh
The first machine execute the following command
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim mapred-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_161
7 Modify slaves
Modify slaves file, and then install the package sent to another machine, restart the cluster can be
the first machine execute the following command
cd /export/servers/hadoop-2.7.5/etc/hadoop
vim slaves
server01
server02
server03
8 distribution hadoop-2.7.5
The first machine execute the following command
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/tempDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/namenodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/datanodeDatas2
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/snn/name
mkdir -p /export/servers/hadoop-2.7.5/hadoopDatas/dfs/snn/edits
cd /export/servers/
scp -r hadoop-2.7.5 server01:$PWD
scp -r hadoop-2.7.5 server03:$PWD
Step Four: Configure hadoop environment variables
Three machines should be configured hadoop environment variables
Three machines execute the following command
vim / etc / profile
export HADOOP_HOME=/export/servers/hadoop-2.7.5
export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
After the entry into force of the configuration is complete
source /etc/profile
Step Five: Start the cluster
To start the Hadoop cluster, you need to start and HDFS YARN two modules. Note: When you first start HDFS, must
be formatted operate it. Are some of the clean-up and preparation work on the essence, because at this time of HDFS physically or absence
of.
hdfs namenode -format or hadoop namenode -format
ready to start
the first machine execute the following command
cd /export/servers/hadoop-2.7.5/
bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
Step six: three ports viewing interface
http: // node01: 50070 / explorer.html # / View HDFS
HTTP: // node01: 8088 / Cluster View yarn cluster
http: // node01: 19888 / jobhistory view the history of completed tasks
hdfs: // node01: 8020 hdfs file link
hdfs: // node01: 50090 SecondaryNameNode address and port
hdfs: // node01: 50070 nameNode address port
127.0.0.1:8050 yarn resource server port call