HiBench 7.x 使用问题整理

一. 介绍

HiBench是一款用于hadoop集群性能测试的开源工具。支持MR,HIVE,SPARK等计算框架,且支持多种维度的测试。早在HiBench5.0的时候作者就用过,个人感觉还不错,无论是apache hadoop,cdh还是hdp都可以支持。

最新发现,HiBench 7.x 版本发布了。因此就趁着机会,结合新版本总结一下使用中遇到的问题,助大家过坑。

HiBench的github地址如下:https://github.com/intel-hadoop/HiBench

环境:

OS:CentOS 7.3

Hadoop: HDP 2.6.5.x

HiBench: 7.1

二. 安装

安装方式:

使用maven命令即可,具体请见:https://github.com/intel-hadoop/HiBench/blob/master/docs/build-hibench.md

问题整理:

maven构建问题:

问题描述:

maven构建过程中报以下错误(以mahout为例):

INFO] --- download-maven-plugin:1.2.0:wget (default) @ mahout ---
Downloading: http://archive.apache.org/dist/mahout/0.11.0//apache-mahout-distribution-0.11.0.tar.gz
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Could not get content
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Retrying (1 more)
Downloading: http://archive.apache.org/dist/mahout/0.11.0//apache-mahout-distribution-0.11.0.tar.gz
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Could not get content
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

层主在构建"hadoopbench-sql","mahout"和"nutchindexing"时都遇到了类似问题。

问题原因:

网站访问协议更换导致的对应文件下载异常。

解决方法:

方式1. 将pom.xml中的"http"修改为"https",例如:

<properties>
    <repo>https://archive.apache.org</repo>
    <file>dist/hive/hive-0.14.0/apache-hive-0.14.0-bin.tar.gz</file>
</properties>

方式2. 将对应文件在"https://archive.apache.org/dist"此源中下载至本地http,再修改pom.xml中的文件路径,例如:

<properties>
    <repo>http://private.repo.io/archive.apache.org</repo>
    <file>dist/hive/hive-0.14.0/apache-hive-0.14.0-bin.tar.gz</file>
</properties>

第一种方式最简单。如果网速不给力,请用第二种方法。

三. 使用

使用方式:

具体不同的测试请参见:https://github.com/intel-hadoop/HiBench/blob/master/docs/ 中的"run-*.md"

问题整理:

===================================================================

权限问题:

问题描述:

执行HiBench命令报错:

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/var/tmp":hdfs:hdfs:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:325)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:246)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1934)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1917)
	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4185)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1109)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:645)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
	at org.apache.hadoop.ipc.Client.call(Client.java:1498)
	at org.apache.hadoop.ipc.Client.call(Client.java:1398)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:610)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184)
	at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3099)
	... 21 more

问题原因:

HiBench在使用过程中,会需要在HDFS下建立目录,如果默认使用root用户进行测试,且HDFS开启简单权限模式的话(hdfs为超级用户),自然会权限限制。

解决方法:

预先创建以下目录,并修改权限(暴力点可以直接给777,或是确定进行测试用户具备相应目录的写权限):

注:部分目录可通过conf/hibench.conf进行修改,以下目录参考是以默认配置文件为基准的。

/HiBench
/var
/user/root # root是作者用来测试的用户

===================================================================

sparkbench 日志没有输出到job history的问题:

问题描述:

使用默认的HiBench的运行命令执行spark作业测试,例如:

bin/workloads/sql/scan/spark/run.sh

该spark作业的日志将不会出现在job history server中。

问题原因:

HiBench的作业默认使用的是自生成的spark.conf而不是使用实际环境下的spark-defaults.conf。

解决方法:

首先,根据实际环境下的spark-defaults.conf找到以下用于规定日志输出的三个参数:

spark.eventLog.enabled
spark.yarn.historyServer.address
spark.eventLog.dir

将上述三个参数拼接成用于spark-submit的conf信息:

--conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=node1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/" 

修改HiBench的"bin/functions/workload_functions.sh"文件,将上述conf信息添加至提交命令中,大概在216行(修改前请备份):

修改前:

if [[ "$CLS" == *.py ]]; then
        LIB_JARS="$LIB_JARS --jars ${SPARKBENCH_JAR}"
        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@"
else        
        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@"
fi

修改后:

if [[ "$CLS" == *.py ]]; then
        LIB_JARS="$LIB_JARS --jars ${SPARKBENCH_JAR}"
        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=master1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/" --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@"
else
        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=master1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/" --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@"
fi

===================================================================

Streaming原始数据由HDFS向Kafka输入失败的问题:

问题描述:

执行以下命令将在HDFS生成的种子数据发往Kafka:

bin/workloads/streaming/identity/prepare/dataGen.sh

发生异常:

====================================================
log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.producer.ProducerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "pool-1-thread-1" java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
        at com.intel.hibench.datagen.streaming.util.SourceFileReader.getReader(SourceFileReader.java:33)
        at com.intel.hibench.datagen.streaming.util.CachedData.<init>(CachedData.java:56)
        at com.intel.hibench.datagen.streaming.util.CachedData.getInstance(CachedData.java:43)
        at com.intel.hibench.datagen.streaming.util.KafkaSender.<init>(KafkaSender.java:55)
        at com.intel.hibench.datagen.streaming.DataGenerator$DataGeneratorJob.run(DataGenerator.java:116)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: mycluster
        ... 20 more

问题原因:

作者本人的hadoop集群的HDFS是采用高可用部署的,namenode为active/standby模式,拥有一个namespace叫"mycluster",因此默认在HiBench的conf/hadoop.conf中的配置信息为:

# The root HDFS path to store HiBench data
hibench.hdfs.master       hdfs://mycluster

具体原因应该出自数据传输的实现,请见"com.intel.hibench.datagen.streaming.DataGenerator"。目测是该实现不支持namespace的访问方式,本人没有细究,有兴趣的可以去看下上述类的源码。

解决方法:

将上述conf/hadoop.conf中的配置信息修改为指定的namenode信息即可,例如:

# The root HDFS path to store HiBench data
hibench.hdfs.master       hdfs://node1.com:8020

===================================================================

其它:

SequenceFile格式化:

很多默认的HiBench的测试结果文件都是以SequenceFile进行输出的(mahout必须是SequenceFile),具体请见各个作业的执行脚本,例如Wordcount的run.sh(截取部分):

run_hadoop_job ${HADOOP_EXAMPLES_JAR} wordcount \
    -D mapreduce.job.maps=${NUM_MAPS} \
    -D mapreduce.job.reduces=${NUM_REDS} \
    -D mapreduce.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat \
    -D mapreduce.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \
    -D mapreduce.job.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat \
    -D mapreduce.job.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \
    ${INPUT_HDFS} ${OUTPUT_HDFS}
END_TIME=`timestamp`

为了方便阅读或是结果数据的后续使用,可通过如下方式进行转化:

使用FS SHELL进行文本输出:

sudo -u hdfs hadoop fs -text /path/xxx

使用MR(转载)实现文件转换:

https://blog.csdn.net/sinat_29508201/article/details/49155127

使用Spark(pyspark):

# SparkSession available as 'spark'
reader=sc.sequenceFile("xxxx",keyClass="org.apache.hadoop.io.Text",valueClass="org.apache.hadoop.io.IntWritable")

# 直接打印出来
def out(x):
    print x
reader.foreach(out)

# 转为sql
df=spark.read.json(reader.map(lambda x: x))

猜你喜欢

转载自blog.csdn.net/BalaBalaYi/article/details/85089274