Hadoop的第一个程序

1. 安装

环境
Ubuntu

步骤:

(1) 安装JDK

Hadoop是用Java实现的,首先安装Java开发工具包(JDK)
检查JDK是否可用:

$ javac
$ java -version

一旦安装好,添加 JDK/bin 路径

通过以下命令查找 JDK 安装路径:

ls -lrt /etc/alternatives/java

结果:

/etc/alternatives/java -> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

安装路径为:/usr/lib/jvm/java-8-openjdk-amd64
以下为添加路径方法:

$ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
$ export PATH=$JAVA_HOME/bin:${PATH}

(2) 安装 Hadoop
以版本 3.1.1 为例。

wget http://archive.apache.org/dist/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz

解压缩:

tar -xf hadoop-3.1.1.tar.gz

添加PATH变量
/hadoop-3.1.1/etc/hadoop 目录下 Hadoop-env.sh文件中,把JAVA_HOME 那行取消注释掉,然后添加 JDK 安装路径和Hadoop解压后的路径:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/data/hadoop-3.1.1
export PATH=$PATH:/data/hadoop-3.1.1/bin

(3)创建Hadoop用户

~$ su
输入密码
useradd -m hadoop -s /bin/bash
设置密码
adduser hadoop sudo   # 为用户hadoop增加管理员权限

2. 示例

在 Hadoop解压目录下,创建input 文件夹,统计 README.txt文件的词频。

mkdir input
cp README.txt input
bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-3.1.1-sources.jar org.apache.hadoop.examples.WordCount input output

统计结果在output 文件夹的part-r-00000 文件中。

计算圆周率

创建4个任务,抛骰子 10000次。

/data/hadoop-3.1.1$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar pi 4 10000

运行结果:

Number of Maps  = 4
Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
2020-04-13 13:45:20,541 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2020-04-13 13:45:20,598 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-04-13 13:45:20,598 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2020-04-13 13:45:20,725 INFO input.FileInputFormat: Total input files to process : 4
2020-04-13 13:45:20,733 INFO mapreduce.JobSubmitter: number of splits:4
2020-04-13 13:45:21,004 INFO mapreduce.Job: Running job: job_local749566201_0001

2020-04-13 13:45:21,773 INFO mapred.LocalJobRunner: Finishing task: attempt_local749566201_0001_r_000000_0
2020-04-13 13:45:21,774 INFO mapred.LocalJobRunner: reduce task executor complete.
2020-04-13 13:45:22,009 INFO mapreduce.Job: Job job_local749566201_0001 running in uber mode : false
2020-04-13 13:45:22,010 INFO mapreduce.Job:  map 100% reduce 100%
2020-04-13 13:45:22,012 INFO mapreduce.Job: Job job_local749566201_0001 completed successfully
2020-04-13 13:45:22,026 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=1592812
		FILE: Number of bytes written=4076848
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=4
		Map output records=8
		Map output bytes=72
		Map output materialized bytes=112
		Input split bytes=604
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=112
		Reduce input records=8
		Reduce output records=0
		Spilled Records=16
		Shuffled Maps =4
		Failed Shuffles=0
		Merged Map outputs=4
		GC time elapsed (ms)=13
		Total committed heap usage (bytes)=7809269760

Job Finished in 1.668 seconds
Estimated value of Pi is 3.14140000000000000000

参考:

  1. Hadoop install;
  2. 如何用hadoop自带的包计算pi值
发布了511 篇原创文章 · 获赞 152 · 访问量 77万+

猜你喜欢

转载自blog.csdn.net/rosefun96/article/details/105486541