1. Hadoop
Cluster construction (self-study course CentOS6.5
)
1.1. Configure secret-free access between virtual machines
ssh-keygen -t rsa #生成公钥
ssh-copy-id [email protected] #将本机公钥复制到指定IP的虚拟机中
1.2, configure exemption between virtual machines ip
(hostname access)
vi /etc/hosts #配置主机名和ip的映射
# 例如
192.168.32.11 vm1
192.168.32.12 vm2
192.168.32.13 vm3
192.168.32.14 vm4
192.168.32.15 vm5
192.168.32.16 vm6
1.3, installationhadoop
Local Uploadhadoop
rz #上传文件,xshell会出现弹框选中上传的文件
Unzip the hadoop
package to the specified /local/
folder (folder optional)
tar -zxvf xxxxx -C /local/
The unzipped folder usually has a version suffix, you can rename it according to your habits
cd /local/ #进入hadoop的解压目录
mv hadoopx.x.x hadoop #将hadoop文件夹重命名去掉版本号后缀
Configured hadoop
environment variables
vi /etc/profile
# 在打开的文件下加入
export HADOOP_HOME=/local/hadoop
export $PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
(1) core-site.xml
Contents of change
<configuration>
<property>
<name>fs.defaultFS</name>
<!--本机的名称:端口 -->
<value>hdfs://vm1:9000</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///local/hadoop/dfs/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
(2) hdfs-site.xml
Contents of change
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>vm1:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>vm2:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///local/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
(3) yarn-site.xml
Contents of change
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>vm1</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>vm1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>vm1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>vm1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>vm1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>vm1:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>flase</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>6</value>
<description>每个任务使用的虚拟内存占物理内存的百分比</description>
</property>
</configuration>
(4) mapred-site.xml
Contents of change
My default suffix is there, template
so I changed my name first
mv 源文件名 更改的文件名
change content
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>vm1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>vm1:19888</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/local/hadoop/etc/*,
/local/hadoop/etc/hadoop/*,
/local/hadoop/lib/*,
/local/hadoop/share/hadoop/common/*,
/local/hadoop/share/hadoop/common/lib/*,
/local/hadoop/share/hadoop/mapreduce/*,
/local/hadoop/share/hadoop/mapreduce/lib/*,
/local/hadoop/share/hadoop/hdfs/*,
/local/hadoop/share/hadoop/hdfs/lib/*,
/local/hadoop/share/hadoop/yarn/*,
/local/hadoop/share/hadoop/yarn/lib/*
</value>
</property>
<property>
<name>mapred.remote.os</name>
<value>Linux</value>
</property>
</configuration>
The hadoop
copy to the node (three machines in claim same path, so does not need to configure all three)
scp -r hadoop/ vm3:/local/
scp -r hadoop/ vm2:/local/
(5) Format on the first machine
Enter the hadoop
root directory
bin/hdfs namenode -format
chkconfig --level 2345 iptables off #关闭防火墙,为了方便就直接干掉了
sbin/start-dfs.sh #启动hdfs
sbin/start-yarn.sh #启动yarn
Now open the browser to access the address configured on the first machinehttp://192.168.32.11:50070/
Configuration is complete (I’m a beginner, I hope you can correct me)