Hadoop 3.1.1 版本自带基准测试工具使用以及简单操作示例

支持的测试方法

进入hadoop-3.1.1安装目录下的share/Hadoop/mapreduce目录下，本文的安装目录为“/data/hadoop-3.1.1/share/hadoop/mapreduce”执行命令：

hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar

存在测试有如下：DFSCIOTest、DistributedFSCheck、JHLogAnalyzer、MRReliabilityTest、NNdataGenerator、NNloadGenerator、NNloadGeneratorMR、NNstructureGenerator、SliveTest、TestDFSIO、fail、filebench、gsleep、largesorter、loadgen、mapredtest、minicluster、mrbench、nnbench、nnbenchWithoutMR、sleep、testbigmapoutput、testfilesystem、testmapredsort、testsequencefile、testsequencefileinputformat、testtextinputformat、threadedmapbench、timelineperformance

这些测试程序从多个角度对hadoop进行测试，TestDFSIO、mrbench、nnbench、Terasort 是三个广泛被使用的测试。

TestDFSIO

功能概述

TestDFSIO用于测试HDFS的IO性能，使用一个MapReduce作业来并发地执行读写操作，每个map任务用于读或写文件，map的输出用于收集与处理文件相关的统计信息，reduce用于累积统计信息，并产生summary。

使用说明

查看用法：

hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar TestDFSIO

Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-storagePolicy storagePolicyName] [-erasureCodePolicy erasureCodePolicyName]

操作实例

1. 往HDFS中写入10个100MB的文件

命令行：

hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar TestDFSIO -write -nrFiles 10 -size 100MB

输出结果：

2. 往HDFS中读取10个100MB的文件

命令行：

hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar TestDFSIO –read -nrFiles 10 -size 100MB

输出结果：

3. 删除测试数据

命令行:

 hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar TestDFSIO -clean

输出结果：

nnbench

功能概述

用于测试NameNode的负载，会生成很多与HDFS相关的请求，给NameNode施加较大的压力。该测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。

使用说明

查看用法：

hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar nnbench -help

Usage: nnbench <options>

Options:

-operation <Available operations are create_write open_read rename delete. This option is mandatory>

 * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.

-maps <number of maps. default is 1. This is not mandatory>

-reduces <number of reduces. default is 1. This is not mandatory>

-startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time. default is launch time + 2 mins. This is not mandatory>

-blockSize <Block size in bytes. default is 1. This is not mandatory>

-bytesToWrite <Bytes to write. default is 0. This is not mandatory>

-bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>

-numberOfFiles <number of files to create. default is 1. This is not mandatory>

-replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>

-baseDir <base DFS path. default is /benchmarks/NNBench. This is not mandatory>

-readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>

-help: Display the help statement

操作实例

1. 使用6个mapper和3个reducer来创建100个文件

命令行：

hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar nnbench -operation create_write -maps 6 -reduces 3 -blockSize 8 -bytesToWrite 1 -numberOfFiles 100 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-‘hostname –s’

执行结果：

参数解释说明：

-operation：指定操作类型为create_write(创建写)

-maps：指定使用的map的个数为6

-reduces：指定使用的reduce的个数为3

-blockSize：块大小(以字节为单位)为8字节

-bytesToWrite：写字节大小为1

-numberOfFiles：文件数为100

-replicationFactorPerFile：文件的重复因子个数为3

- readFileAfterOpen :打开读文件状态为true

-baseDir：DFS的基本路径为/benchmarks/NNBench-'hostname -s'

mrbench

功能概述

mrbench会多次重复执行一个小作业，用于检查在机群上小作业的运行是否可重复以及运行是否高效。

使用说明

查看用法：

hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar mrbench -help

Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]

操作实例

1. 使用6个mapper和3个reducer运行一个小作业50次，生成输入行数为10，降序排列

命令行：

hadoop jar hadoop-mapreduce-client-jobclient-3.1.1-tests.jar mrbench -numRuns 50 -maps 6 -reduces 3 -inputLines 10 -inputType descending

执行结果：

参数解释说明：

-numRuns：运行次数为50

-maps：每次运行使用的map个数为6

-reduces：每次运行使用的reduce个数为3

-inputLines：生成输入行数为10

-inputType：输入类型为descending(降序排)

Terasort

功能概述

Terasort是测试Hadoop的一个有效的排序程序。通过Hadoop自带的Terasort排序程序，测试不同的Map任务和Reduce任务数量，对Hadoop性能的影响。实验数据由程序中的teragen程序生成，数量为1G和10G。一个完整的TeraSort测试需要按以下三步执行：

1) 用TeraGen生成随机数据

2) 输入数据运行TeraSort

3) 用TeraValidate验证排好序的输出数据

使用说明

查看支持的功能：

hadoop jar hadoop-mapreduce-examples-3.1.1.jar

一、 Teragen

Teragen用法：

teragen <num rows> <output dir>

二、 Terasort

Terasort用法：

terasort [-Dproperty=value] <in> <out>

三、 Teravalidate

Teravalidate用法：

teravalidate <out-dir> <report-dir>

操作实例

1) 运行teragen生成10000000行的输入数据并输出到目录/examples/terasort-input

命令行：

hadoop jar hadoop-mapreduce-examples-3.1.1.jar teragen 10000000   /examples/terasort-input

Teragen产生的数据每行的格式如下：

<10 bytes key><10 bytes rowid><78 bytes filler>\r\n

其中，key是一些随机字符，每个字符的ASCII码取值范围为[32,126]；rowid是一个整数，右对齐；filler由7组字符组成，每组有10个字符(最后一组8个)，字符从A到Z依次取值。

2) 运行terasort对数据进行排序，并将结果输出到目录/examples/terasort-output

命令行：

hadoop jar hadoop-mapreduce-examples-3.1.1.jar terasort /examples/terasort-input /examples/terasort-output

3) 运行teravalidate验证Terasort输出的数据是否有序，如果监测到问题，将乱序的key输出到目录/examples/terasort-validate

命令行：

hadoop jar hadoop-mapreduce-examples-3.1.1.jar teravalidate /examples/terasort-output /examples/terasort-validate

查看目录下的文件：

hadoop fs -ls /examples/terasort-validate

查看目录下的文件：

hadoop fs -ls /examples

查看文件中的内容：

hadoop fs -cat /examples/terasort-validate/part-r-00000