1. 安装
环境
Ubuntu
步骤:
(1) 安装JDK
Hadoop是用Java实现的,首先安装Java开发工具包(JDK)
检查JDK是否可用:
$ javac
$ java -version
一旦安装好,添加 JDK/bin 路径。
通过以下命令查找 JDK 安装路径:
ls -lrt /etc/alternatives/java
结果:
/etc/alternatives/java -> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
安装路径为:/usr/lib/jvm/java-8-openjdk-amd64
以下为添加路径方法:
$ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
$ export PATH=$JAVA_HOME/bin:${PATH}
(2) 安装 Hadoop
以版本 3.1.1 为例。
wget http://archive.apache.org/dist/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
解压缩:
tar -xf hadoop-3.1.1.tar.gz
添加PATH变量
在 /hadoop-3.1.1/etc/hadoop
目录下 Hadoop-env.sh
文件中,把JAVA_HOME
那行取消注释掉,然后添加 JDK 安装路径和Hadoop解压后的路径:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/data/hadoop-3.1.1
export PATH=$PATH:/data/hadoop-3.1.1/bin
(3)创建Hadoop用户
~$ su
输入密码
useradd -m hadoop -s /bin/bash
设置密码
adduser hadoop sudo # 为用户hadoop增加管理员权限
2. 示例
在 Hadoop解压目录下,创建input
文件夹,统计 README.txt
文件的词频。
mkdir input
cp README.txt input
bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-3.1.1-sources.jar org.apache.hadoop.examples.WordCount input output
统计结果在output
文件夹的part-r-00000
文件中。
计算圆周率
创建4个任务,抛骰子 10000次。
/data/hadoop-3.1.1$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar pi 4 10000
运行结果:
Number of Maps = 4
Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
2020-04-13 13:45:20,541 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2020-04-13 13:45:20,598 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-04-13 13:45:20,598 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2020-04-13 13:45:20,725 INFO input.FileInputFormat: Total input files to process : 4
2020-04-13 13:45:20,733 INFO mapreduce.JobSubmitter: number of splits:4
2020-04-13 13:45:21,004 INFO mapreduce.Job: Running job: job_local749566201_0001
2020-04-13 13:45:21,773 INFO mapred.LocalJobRunner: Finishing task: attempt_local749566201_0001_r_000000_0
2020-04-13 13:45:21,774 INFO mapred.LocalJobRunner: reduce task executor complete.
2020-04-13 13:45:22,009 INFO mapreduce.Job: Job job_local749566201_0001 running in uber mode : false
2020-04-13 13:45:22,010 INFO mapreduce.Job: map 100% reduce 100%
2020-04-13 13:45:22,012 INFO mapreduce.Job: Job job_local749566201_0001 completed successfully
2020-04-13 13:45:22,026 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=1592812
FILE: Number of bytes written=4076848
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=4
Map output records=8
Map output bytes=72
Map output materialized bytes=112
Input split bytes=604
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=112
Reduce input records=8
Reduce output records=0
Spilled Records=16
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=13
Total committed heap usage (bytes)=7809269760
Job Finished in 1.668 seconds
Estimated value of Pi is 3.14140000000000000000
参考: