Spark_ 异常 _java.lang.ArrayIndexOutOfBoundsException: -7 en org.apache.spark.shuffle.sort.BypassMergeSo

 

Intenté un nuevo operador repartitionAndSortWithinPartitions hoy y me encontré con un problema.

 

Las excepciones específicas se informan de la siguiente manera:

java.lang.ArrayIndexOutOfBoundsException: -7
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

 

Consulte el final del artículo para obtener un informe completo de errores:

 

  A través de la información de error anterior, no es difícil ver que la información de error dada es que la matriz está fuera de límites, pero no encontré cómo cruzar la frontera durante mucho tiempo. Finalmente, con la ayuda de la depuración paso a paso y println, encontré la causa del error.

  Este problema está relacionado con mi uso de repartitionAndSortWithinPartitions. Este operador debe pasar un particionador personalizado. El error ocurrió en la implementación del código de mi particionador. El siguiente es el código incorrecto:

      class UserWatchPartitioner(partitions: Int) extends Partitioner {

        require(partitions >= 0, s"Number of partitions ($partitions) cannot be negative.")

        override def numPartitions: Int = partitions

        override def getPartition(key: Any): Int = {
          val k = key.asInstanceOf[OrderKey]
          k.basicKey.hashCode() % numPartitions
          //Math.abs(k.basicKey.hashCode() % numPartitions)
        }
      }

 

El error ocurrió en esta línea de código:

k.basicKey.hashCode ()% numPartitions

Al obtener información de la partición, debido a que el código hash puede ser negativo, la clave se asigna a una partición negativa y se produce un error

 

La siguiente es una implementación correcta del particionador:

      class UserWatchPartitioner(partitions: Int) extends Partitioner {

        require(partitions >= 0, s"Number of partitions ($partitions) cannot be negative.")

        override def numPartitions: Int = partitions

        override def getPartition(key: Any): Int = {
          val k = key.asInstanceOf[OrderKey]
          Math.abs(k.basicKey.hashCode() % numPartitions)
        }
      }

Para un uso detallado y ejemplos de repartitionAndSortWithinPartitions, consulte mi artículo

 

 

 

Informe de error completo:

19/09/18 21:47:05 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.ArrayIndexOutOfBoundsException: -7
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
19/09/18 21:47:05 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: -7
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

19/09/18 21:47:05 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
19/09/18 21:47:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
19/09/18 21:47:05 INFO TaskSchedulerImpl: Cancelling stage 1
19/09/18 21:47:05 INFO DAGScheduler: ShuffleMapStage 1 (map at ETL_DWDEduWatchOrgDetailFact_2_DWDEduWatchCombDetailFact.scala:124) failed in 1.211 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: -7
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
19/09/18 21:47:05 INFO DAGScheduler: Job 1 failed: count at ETL_DWDEduWatchOrgDetailFact_2_DWDEduWatchCombDetailFact.scala:160, took 1.229507 s
19/09/18 21:47:05 INFO SparkUI: Stopped Spark web UI at http://LAPTOP-JEG7QNE0:4040
19/09/18 21:47:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/09/18 21:47:05 INFO MemoryStore: MemoryStore cleared
19/09/18 21:47:05 INFO BlockManager: BlockManager stopped
19/09/18 21:47:05 INFO BlockManagerMaster: BlockManagerMaster stopped
19/09/18 21:47:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/09/18 21:47:05 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: -7
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092)
	at org.apache.spark.rdd.RDD.count(RDD.scala:1162)
	at com.gaosi.spark.etl.log.front.client.ETL_DWDEduWatchOrgDetailFact_2_DWDEduWatchCombDetailFact$.main(ETL_DWDEduWatchOrgDetailFact_2_DWDEduWatchCombDetailFact.scala:160)
	at com.gaosi.spark.etl.log.front.client.ETL_DWDEduWatchOrgDetailFact_2_DWDEduWatchCombDetailFact.main(ETL_DWDEduWatchOrgDetailFact_2_DWDEduWatchCombDetailFact.scala)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -7
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

 

 

 

519 artículos originales publicados · elogiados 1146 · 2,83 millones de visitas

Supongo que te gusta

Origin blog.csdn.net/u010003835/article/details/100999577
Recomendado
Clasificación