Yarn 报错 Error al manejar el tipo de evento NODE_UPDATE al Event Dispatcher

La información completa del error es la siguiente:

2020-10-14 15:31:00,068 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Assigned container container_1602660632708_0001_01_000055 of capacity <memory:1024, vCores:1> on host hddatanode02:8041, which has 25 containers, <memory:25600, vCores:25> used and <memory:4529, vCores:-17> available after allocation
2020-10-14 15:31:00,068 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type NODE_UPDATE to the Event Dispatcher
java.lang.NullPointerException
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator.decResourceRequest(LocalityAppPlacementAllocator.java:302)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator.allocateNodeLocal(LocalityAppPlacementAllocator.java:288)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator.allocate(LocalityAppPlacementAllocator.java:400)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:430)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoAppAttempt.allocate(FifoAppAttempt.java:83)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:702)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:627)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:589)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:518)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:971)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:761)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:103)
	at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
	at java.lang.Thread.run(Thread.java:748)
2020-10-14 15:31:00,075 INFO org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye..
2020-10-14 15:31:00,078 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-10-14 15:31:00,079 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.w.WebAppContext@7fb48179{
    
    /,null,UNAVAILABLE}{
    
    /cluster}
2020-10-14 15:31:00,082 INFO org.eclipse.jetty.server.AbstractConnector: Stopped ServerConnector@650ae78c{
    
    HTTP/1.1,[http/1.1]}{
    
    hdnamenode01:8088}
2020-10-14 15:31:00,082 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@c1fca1e{
    
    /static,jar:file:/opt/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678/jars/hadoop-yarn-common-3.0.0-cdh6.0.1.jar!/webapps/static,UNAVAILABLE}
2020-10-14 15:31:00,083 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@5bd1ceca{
    
    /logs,file:///var/log/hadoop-yarn/,UNAVAILABLE}
2020-10-14 15:31:00,084 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-10-14 15:31:00,085 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033
2020-10-14 15:31:00,085 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping server on 8030
2020-10-14 15:31:00,090 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8030
2020-10-14 15:31:00,090 INFO org.apache.hadoop.ipc.Server: Stopping server on 8031
2020-10-14 15:31:00,091 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-10-14 15:31:00,091 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-10-14 15:31:00,091 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8031

Cuando se ejecuta una gran cantidad de consultas continuamente, RM suspenderá y dejará de asignar recursos. Cuando se suspende la adquisición de RM, se lanzará una NullPointerException en RegularContainerAllocator.getLocalityWaitFactor.

Se puede ver en el registro que antes de que se suspendiera RM, se asignó una gran cantidad de contenedores, lo que resultó en un error en el manejo del tipo de evento NODE_UPDATE al despachador de eventos.

Asuntos relacionados:

  • https://issues.apache.org/jira/browse/YARN-8462
  • https://issues.apache.org/jira/browse/YARN-8193

Las correcciones relacionadas se realizan en hadoop 3.1.1 y 3.2.0 y 2.10.1

solución:

  • Actualizar la versión de hadoop
  • Lleve a cabo paquetes de parches para problemas relacionados
  • No ejecute muchas consultas continuamente

Supongo que te gusta

Origin blog.csdn.net/qq_43081842/article/details/109094381
Recomendado
Clasificación