Yarn 报错 Error in handling event type NODE_UPDATE to the Event Dispatcher

报错完整信息如下:

2020-10-14 15:31:00,068 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Assigned container container_1602660632708_0001_01_000055 of capacity <memory:1024, vCores:1> on host hddatanode02:8041, which has 25 containers, <memory:25600, vCores:25> used and <memory:4529, vCores:-17> available after allocation
2020-10-14 15:31:00,068 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type NODE_UPDATE to the Event Dispatcher
java.lang.NullPointerException
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator.decResourceRequest(LocalityAppPlacementAllocator.java:302)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator.allocateNodeLocal(LocalityAppPlacementAllocator.java:288)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator.allocate(LocalityAppPlacementAllocator.java:400)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:430)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoAppAttempt.allocate(FifoAppAttempt.java:83)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:702)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignNodeLocalContainers(FifoScheduler.java:627)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:589)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:518)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:971)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:761)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:103)
	at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
	at java.lang.Thread.run(Thread.java:748)
2020-10-14 15:31:00,075 INFO org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye..
2020-10-14 15:31:00,078 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-10-14 15:31:00,079 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.w.WebAppContext@7fb48179{
    
    /,null,UNAVAILABLE}{
    
    /cluster}
2020-10-14 15:31:00,082 INFO org.eclipse.jetty.server.AbstractConnector: Stopped ServerConnector@650ae78c{
    
    HTTP/1.1,[http/1.1]}{
    
    hdnamenode01:8088}
2020-10-14 15:31:00,082 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@c1fca1e{
    
    /static,jar:file:/opt/cloudera/parcels/CDH-6.0.1-1.cdh6.0.1.p0.590678/jars/hadoop-yarn-common-3.0.0-cdh6.0.1.jar!/webapps/static,UNAVAILABLE}
2020-10-14 15:31:00,083 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@5bd1ceca{
    
    /logs,file:///var/log/hadoop-yarn/,UNAVAILABLE}
2020-10-14 15:31:00,084 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-10-14 15:31:00,085 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033
2020-10-14 15:31:00,085 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-10-14 15:31:00,085 INFO org.apache.hadoop.ipc.Server: Stopping server on 8030
2020-10-14 15:31:00,090 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8030
2020-10-14 15:31:00,090 INFO org.apache.hadoop.ipc.Server: Stopping server on 8031
2020-10-14 15:31:00,091 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-10-14 15:31:00,091 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-10-14 15:31:00,091 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8031

当连续运行大量查询的时候,RM会挂起并停止分配资源,在RM获取挂起时,在RegularContainerAllocator.getLocalityWaitFactor上抛出NullPointerException。

通过日志可以看出,在RM挂起之前,分配了大量的container,导致出现Error in handling event type NODE_UPDATE to the Event Dispatcher

相关issue:

  • https://issues.apache.org/jira/browse/YARN-8462
  • https://issues.apache.org/jira/browse/YARN-8193

相关修复是在hadoop 3.1.1 和3.2.0 以及2.10.1 中进行的修复

解决方案:

  • 升级hadoop版本
  • 进行针对相关issue打补丁包
  • 不要连续运行大量查询

猜你喜欢

转载自blog.csdn.net/qq_43081842/article/details/109094381