Learning Spark

1. IDEA下,打包spark jar包:

在projectName/project/目录下,新建assembly.sbt文件,添加内容:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")

build.sbt文件:

name := "ScalaTest"

version := "0.1"

scalaVersion := "2.11.8"

libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.11" % "compile"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0" % "provided"

libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.3.0" % "provided"

libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.3.0" % "compile"

libraryDependencies += "org.apache.kafka" % "kafka-clients" % "0.10.0" % "compile"

其中:compile会包含到jar包中,provided不会。

jar包依赖冲突导致,
解决办法:在build.sbt文件中添加如下:
assemblyMergeStrategy in assembly := {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x =>
        val oldStrategy = (assemblyMergeStrategy in assembly).value
        oldStrategy(x)
}

Run -> Edit Configurations -> 新建一个sbt Task,Tasks填写:assembly

Run -> Run 'assembly'


2. 从本地向一个外部集群提交任务,报错:

Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.

修改conf/spark-env.sh:

export SPARK_LOCAL_IP=localhost (或本地ip)


3. 资源不够问题:

运行一个spark任务,如果资源不够,那么会一直打印:

WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

如果资源不足,spark任务不会立即结束,而是继续等待,如果有stream任务,那么这种任务会一直占着资源,不释放,会导致其他任务一直累积在spark cluster中,却不被执行。

查看 sparkmaster:8080,可以看到core或内存不够,如果是core不足,解决方案:

用sparkConf.set("spark.cores.max", "2");

参考http://wenda.chinahadoop.cn/question/2433

http://spark.apache.org/docs/latest/spark-standalone.html​ Resource Scheduling 章节

猜你喜欢

转载自blog.csdn.net/chunzhenzyd/article/details/80229109
今日推荐