Centos7下Hadoop完全分布式安装

Centos7下Hadoop完全分布式安装

  • 电脑系统:macOS 10.15.4
  • 虚拟机软件:Parallels Desktop14
  • Hadoop各节点节点操作系统:CentOS 7
  • JDK版本:jdk1.8.0_162
  • Hadoop版本:hadoop-2.7.7

第一步:安装文件

准备好工具,虚拟机,3个centos,jdk安装包,Hadoop安装包

先在centos中的/opt/目录下新建一个文件夹Hadoop
然后上传Hadoop 和jdk

上传方式:

scp 本机的文件绝对路径 [email protected]:/opt/Hadoop

解压文件:

tar -zxvf jdk-8u162-linux-x64.tar.gz
tar -zxvf hadoop-2.7.7.tar.gz

创建软连接:

ln -s hadoop-2.7.7 hadoop
ln -s jdk1.8.0_162 jdk

第二步:免密登录配置

(1)vim 的安装

如果centos没有安装好vim,需要安装vim

yum install vim -y

(2)host配置

开启虚拟机,host配置文件在根目录下的 etc 文件夹下,给三台虚拟机均进行配置。
注意,下面的host配置,一定要根据自己的主机名和ip进行配置,三台主机的配置均一样。在根目录下输入

sudo vim /etc/hosts

在每一台文件末尾添加以下内容

10.211.55.59 node1
10.211.55.60 node2
10.211.55.61 node3

(3)关闭防火墙

每一台服务器都要关闭防火强

扫描二维码关注公众号,回复: 11360844 查看本文章
  • 查看防火墙状态
firewall-cmd --state
  • 停止防火墙
systemctl stop firewalld.service
  • 禁止防火墙开机启动
systemctl disable firewalld.service
  • 关闭selinux
sudo vim /etc/selinux/config

注释掉 SELINUX=enforcing ,添加如下内容:

SELINUX=disabled

也可以直接将enforcing修改为disabled。

(4)实现免密登陆

配置每一台服务器本身公钥和免密:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

将公钥追加到”authorized_keys”文件

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

免密登录公钥分发
node1 分发给:node1、node2、node3

ssh-copy-id -i ~/.ssh/id_dsa.pub node1
ssh-copy-id -i ~/.ssh/id_dsa.pub node2
ssh-copy-id -i ~/.ssh/id_dsa.pub node3

node2 分发给:node1、node2、node3

ssh-copy-id -i ~/.ssh/id_dsa.pub node1
ssh-copy-id -i ~/.ssh/id_dsa.pub node2
ssh-copy-id -i ~/.ssh/id_dsa.pub node3

node3 分发给:node1、node2、node3

ssh-copy-id -i ~/.ssh/id_dsa.pub node1
ssh-copy-id -i ~/.ssh/id_dsa.pub node2
ssh-copy-id -i ~/.ssh/id_dsa.pub node3

免密登录配置,已经完成,可以进行测试
验证免密码登陆:在node1主机中输入以下命令验证

ssh node1
ssh node2
ssh node3

(5)安装NTP时间同步服务

三台虚拟机都要安装,需要进入root权限下:

sudo -i
  • 安装ntp
yum install -y ntp
  • 设置NTP服务开机启动
chkconfig ntpd on
  • 查看ntp进程是否启动
ps  aux | grep ntp

显示:

root     12650  0.0  0.0 112728   968 pts/0    S+   11:56   0:00 grep --color=auto ntp

第三步:配置环境变量

配置jdk、Hadoop环境变量:

sudo vim ~/.bashrc

在文件末尾添加如下代码

export JAVA_HOME=/opt/Hadoop/jdk1.8.0_162
export CLASSPATH=${JAVA_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
export HADOOP_HOME=/opt/Hadoop/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

上面这个地方根据自己的实际要求填写
保存并退出
是设置生效:

source  ~/.bashrc

检验java -version和whereis hdfs
能输出Java版本号的即为配置环境变量成功

第四步:设置Hadoop配置文件

在node1进行文件配置
进入hadoop目录

cd /opt/Hadoop/hadoop-2.7.7/etc/hadoop

(1)配置hadoop-env.sh文件

设置hadoop-env.sh文件

vim hadoop-env.sh

找到export JAVA_HOME,修改如下:

export JAVE_HOME=/opt/Hadoop/jdk1.8.0_162

(2)配置core-site.xml文件

vim core-site.xml

修改core-site.xml文件

<configuration>cd 
        <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:8020</value>
        </property>
        <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/Hadoop/hadoop-2.7.7/tmp</value>
        </property>
        <property>
               <name>hadoop.http.staticuser.user</name>
               <value>caizhengjie</value>
        </property>
</configuration>

(3)配置hdfs-site.xml文件

vim hdfs-site.xml

修改hdfs-site.xml文件

<configuration>
        <property>
        <name>dfs.replication</name>
        <value>2</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.permissions.enabled</name>
                <value>false</value>
        </property>
</configuration>

(4)配置mapred-site.xml文件

将 mapred-site.xml.template 复制为文件名是 mapred-site.xml 的文件

cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

修改mapred-site.xml文件

<configuration>
        <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        </property>
</configuration>

如果在测试mapreduce出现问题,请见这篇文章
https://blog.csdn.net/weixin_45366499/article/details/103752447

(5)配置yarn-site.xml文件

vim yarn-site.xml

修改yarn-site.xml文件

<configuration>
        <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node1</value>
        </property>
        <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        </property>
	</configuration>

(6)修改 slaves 文件

这是分配datanode,如果想配置三台datanode则需要在slaves文件里添加三台主机名
将原有的内容删去,加以上内容:

node1
node2
node3

或者

node2
node3

这里会出现一个问题:当重新启动服务的时候,会出现“卡住不动的情况”,这种情况千万不要ctrl+c给停止掉,这是让你输入node1的密码,因为字比较小我经常会忽略掉,以为是个报错,其实不是报错,是想让你输入密码而已!这个坑我卡了三天!

第五步:分发配置到 node2、node3 虚拟机

  • 方案一:如果第一步上传文件只上传在node1中,node2,node3没有上传,则可以将整个Hadoop文件传给node2,node3。但是,上传完之后需要在node2,node3中配置环境变量。
scp -r Hadoop caizhengjie@node2:/opt/
scp -r Hadoop caizhengjie@node3:/opt/
  • 方案二:如果在node2,node3中均已上传文件并配置好环境变量,则只需要将 hadoop/etc目录下的 hadoop 文件夹分发给另外两台虚拟机
	scp -r hadoop caizhengjie@node2:/opt/Hadoop/hadoop-2.7.7/etc/hadoop
	scp -r hadoop caizhengjie@node3:/opt/Hadoop/hadoop-2.7.7/etc/hadoop

第六步:运行Hadoop及测试

在运行初次运行hadoop之前,需要在 node1 格式化 hdfs

hdfs namenode -format
  • 启动HDFS:start-dfs.sh
  • 启动YARN:start-yarn.sh
  • 启动all:start-all.sh
  • 关闭Hadoop服务:stop-all.sh

检验Hadoop进程:jps
node1中出现:

[caizhengjie@node1 ~]$ jps
2166 NameNode
2523 ResourceManager
2363 SecondaryNameNode
2782 Jps

node2中出现:

[caizhengjie@node2 ~]$ jps
2370 Jps
2133 DataNode
2247 NodeManager

node3中出现:

[caizhengjie@node3 ~]$ jps
2144 DataNode
2257 NodeManager
2403 Jps

访问网页:
在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

测试Hadoop集群中自带的mapreduce程序

20/06/03 16:42:32 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
20/06/03 16:42:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
20/06/03 16:42:32 INFO input.FileInputFormat: Total input paths to process : 1
20/06/03 16:42:32 INFO mapreduce.JobSubmitter: number of splits:1
20/06/03 16:42:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local602249199_0001
20/06/03 16:42:33 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
20/06/03 16:42:33 INFO mapreduce.Job: Running job: job_local602249199_0001
20/06/03 16:42:33 INFO mapred.LocalJobRunner: OutputCommitter set in config null
20/06/03 16:42:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
20/06/03 16:42:33 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
20/06/03 16:42:33 INFO mapred.LocalJobRunner: Waiting for map tasks
20/06/03 16:42:33 INFO mapred.LocalJobRunner: Starting task: attempt_local602249199_0001_m_000000_0
20/06/03 16:42:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
20/06/03 16:42:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
20/06/03 16:42:33 INFO mapred.MapTask: Processing split: hdfs://node1:8020/input/123.txt:0+466
20/06/03 16:42:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
20/06/03 16:42:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
20/06/03 16:42:33 INFO mapred.MapTask: soft limit at 83886080
20/06/03 16:42:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
20/06/03 16:42:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
20/06/03 16:42:33 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
20/06/03 16:42:33 INFO mapred.LocalJobRunner: 
20/06/03 16:42:33 INFO mapred.MapTask: Starting flush of map output
20/06/03 16:42:33 INFO mapred.MapTask: Spilling map output
20/06/03 16:42:33 INFO mapred.MapTask: bufstart = 0; bufend = 852; bufvoid = 104857600
20/06/03 16:42:33 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214008(104856032); length = 389/6553600
20/06/03 16:42:33 INFO mapred.MapTask: Finished spill 0
20/06/03 16:42:33 INFO mapred.Task: Task:attempt_local602249199_0001_m_000000_0 is done. And is in the process of committing
20/06/03 16:42:33 INFO mapred.LocalJobRunner: map
20/06/03 16:42:33 INFO mapred.Task: Task 'attempt_local602249199_0001_m_000000_0' done.
20/06/03 16:42:33 INFO mapred.Task: Final Counters for attempt_local602249199_0001_m_000000_0: Counters: 23
	File System Counters
		FILE: Number of bytes read=296198
		FILE: Number of bytes written=615022
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=466
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=5
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1
	Map-Reduce Framework
		Map input records=1
		Map output records=98
		Map output bytes=852
		Map output materialized bytes=846
		Input split bytes=96
		Combine input records=98
		Combine output records=77
		Spilled Records=77
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=327
		Total committed heap usage (bytes)=372768768
	File Input Format Counters 
		Bytes Read=466
20/06/03 16:42:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local602249199_0001_m_000000_0
20/06/03 16:42:33 INFO mapred.LocalJobRunner: map task executor complete.
20/06/03 16:42:33 INFO mapred.LocalJobRunner: Waiting for reduce tasks
20/06/03 16:42:33 INFO mapred.LocalJobRunner: Starting task: attempt_local602249199_0001_r_000000_0
20/06/03 16:42:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
20/06/03 16:42:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
20/06/03 16:42:33 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@483200b9
20/06/03 16:42:33 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
20/06/03 16:42:33 INFO reduce.EventFetcher: attempt_local602249199_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
20/06/03 16:42:33 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local602249199_0001_m_000000_0 decomp: 842 len: 846 to MEMORY
20/06/03 16:42:33 INFO reduce.InMemoryMapOutput: Read 842 bytes from map-output for attempt_local602249199_0001_m_000000_0
20/06/03 16:42:33 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 842, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->842
20/06/03 16:42:33 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
	at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
20/06/03 16:42:33 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
20/06/03 16:42:33 INFO mapred.LocalJobRunner: 1 / 1 copied.
20/06/03 16:42:33 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
20/06/03 16:42:33 INFO mapred.Merger: Merging 1 sorted segments
20/06/03 16:42:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 838 bytes
20/06/03 16:42:33 INFO reduce.MergeManagerImpl: Merged 1 segments, 842 bytes to disk to satisfy reduce memory limit
20/06/03 16:42:33 INFO reduce.MergeManagerImpl: Merging 1 files, 846 bytes from disk
20/06/03 16:42:33 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
20/06/03 16:42:33 INFO mapred.Merger: Merging 1 sorted segments
20/06/03 16:42:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 838 bytes
20/06/03 16:42:33 INFO mapred.LocalJobRunner: 1 / 1 copied.
20/06/03 16:42:33 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
20/06/03 16:42:33 INFO mapred.Task: Task:attempt_local602249199_0001_r_000000_0 is done. And is in the process of committing
20/06/03 16:42:33 INFO mapred.LocalJobRunner: 1 / 1 copied.
20/06/03 16:42:33 INFO mapred.Task: Task attempt_local602249199_0001_r_000000_0 is allowed to commit now
20/06/03 16:42:33 INFO output.FileOutputCommitter: Saved output of task 'attempt_local602249199_0001_r_000000_0' to hdfs://node1:8020/output/_temporary/0/task_local602249199_0001_r_000000
20/06/03 16:42:33 INFO mapred.LocalJobRunner: reduce > reduce
20/06/03 16:42:33 INFO mapred.Task: Task 'attempt_local602249199_0001_r_000000_0' done.
20/06/03 16:42:33 INFO mapred.Task: Final Counters for attempt_local602249199_0001_r_000000_0: Counters: 29
	File System Counters
		FILE: Number of bytes read=297922
		FILE: Number of bytes written=615868
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=466
		HDFS: Number of bytes written=532
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Map-Reduce Framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=77
		Reduce shuffle bytes=846
		Reduce input records=77
		Reduce output records=77
		Spilled Records=77
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=372768768
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=532
20/06/03 16:42:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local602249199_0001_r_000000_0
20/06/03 16:42:33 INFO mapred.LocalJobRunner: reduce task executor complete.
20/06/03 16:42:34 INFO mapreduce.Job: Job job_local602249199_0001 running in uber mode : false
20/06/03 16:42:34 INFO mapreduce.Job:  map 100% reduce 100%
20/06/03 16:42:34 INFO mapreduce.Job: Job job_local602249199_0001 completed successfully
20/06/03 16:42:34 INFO mapreduce.Job: Counters: 35
	File System Counters
		FILE: Number of bytes read=594120
		FILE: Number of bytes written=1230890
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=932
		HDFS: Number of bytes written=532
		HDFS: Number of read operations=13
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Map-Reduce Framework
		Map input records=1
		Map output records=98
		Map output bytes=852
		Map output materialized bytes=846
		Input split bytes=96
		Combine input records=98
		Combine output records=77
		Reduce input groups=77
		Reduce shuffle bytes=846
		Reduce input records=77
		Reduce output records=77
		Spilled Records=154
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=327
		Total committed heap usage (bytes)=745537536
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=466
	File Output Format Counters 
		Bytes Written=532


以上内容仅供参考学习,如有侵权请联系我删除!
如果这篇文章对您有帮助,左下角的大拇指就是对博主最大的鼓励。
您的鼓励就是博主最大的动力!

猜你喜欢

转载自blog.csdn.net/weixin_45366499/article/details/106529506