kylin2.3作业将结果写入hbase时报错TableNotFoundException

执行kylin作业报错,这个作业是要把运行结果写入到hbase的表里的,但是再写入hbase过程中报错hbase中没有表 'kylin_metadata'。
错误日志摘要——

2018-05-07 20:28:46,137 WARN  [main] util.HeapMemorySizeUtil:55 : hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.memstore.size
2018-05-07 20:28:46,395 DEBUG [main] hbase.HBaseConnection:307 : HTable 'kylin_metadata' already exists
Exception in thread "main" java.lang.IllegalArgumentException: Failed to find metadata store by url: kylin_metadata@hbase
at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:96)
at org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:108)
at org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:94)
at org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:41)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:90)
... 3 more
Caused by: org.apache.hadoop.hbase.TableNotFoundException: kylin_metadata
at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:578)
at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:551)
at org.apache.kylin.storage.hbase.HBaseConnection.createHTableIfNeeded(HBaseConnection.java:308)
at org.apache.kylin.storage.hbase.HBaseResourceStore.createHTableIfNeeded(HBaseResourceStore.java:111)
at org.apache.kylin.storage.hbase.HBaseResourceStore.<init>(HBaseResourceStore.java:92)
... 8 more


大概说的是default名称空间下的表不存在:TableNotFoundException: kylin_metadata  这个kylin_metadata的作用:存放了所有的cube元数据,包括cube描述和实例、项目信息、倒排索引描述、作业、表和字典。Kylin使用HBase而不是正常的文件系统来存放这些元数据。如果检查你的配置文件(kylin.properties)会发现这一行:
## The metadata store in hbase
kylin.metadata.url=kylin_metadata@hbase
意思是说元数据将保存在hbase中的 kylin_metadata 表里,可以在hbase shell里扫描来查看里面的数据


进到hbase shell查看default下面只有test_yhl这张表,而没有kylin_metadata,于是尝试新建一张表kylin_metadata
hbase(main):005:0> list
TABLE
nice_users:users_like                                                                                                                                                             
nice_users:china_users                                                                                                                                                           
nice_users:china_users0426                                                                                                                                                   
nice_users:weekly_stat                                                                                                                                                           
test_yhl
5 row(s) in 0.0120 seconds
=> ["nice_users:users_like", "nice_users:china_users", "nice_users:china_users0426", "nice_users:weekly_stat", "test_yhl"]

【尝试解决1】建表时报错
hbase(main):003:0> create 'default:kylin_metadata', "f1"
ERROR: Table already exists: default:kylin_metadata!

【尝试解决2】使用hbck检查该表
[hbase@kmr-a20125dd-gn-a05044c6-master-1-001 root]$ hbase hbck -details "default:kylin_metadata" > ~/kylin_metadata_DETAILS.log 2>&1
检查结果kylin_metadata_DETAILS.log摘要:
Number of regions in transition: 11
  KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2536b7. state=FAILED_CLOSE, ts=Mon May 07 17:02:14 CST 2018 (13106s ago), server=null
  kylin_metadata,,1524037727389.e770d1e6528b6154f29a5314e6c8733d. state=FAILED_CLOSE, ts=Mon May 07 17:02:14 CST 2018 (13106s ago), server=null
  KYLIN_8FCVO3VMZ5,,1525312564354.367e62a2489d180428f6c23f0092bdf6. state=FAILED_CLOSE, ts=Mon May 07 17:02:14 CST 2018 (13106s ago), server=null
  KYLIN_Z806Q81TI0,,1524723913265.05c0ffe0e9f64b117e4ab40171782cae. state=FAILED_CLOSE, ts=Mon May 07 17:02:13 CST 2018 (13107s ago), server=null
  KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a5bbed. state=FAILED_CLOSE, ts=Mon May 07 17:02:13 CST 2018 (13107s ago), server=null
  KYLIN_9YN4WTKNI0,,1525656643173.40353cc56028c2bf85fad6a01626ddb3. state=FAILED_CLOSE, ts=Mon May 07 17:02:14 CST 2018 (13106s ago), server=null
  KYLIN_1JTYCRWRQT,,1525667723449.da8073440b414fc140cb7f5fa0341ae5. state=FAILED_CLOSE, ts=Mon May 07 17:02:14 CST 2018 (13106s ago), server=null
  KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e53ed89. state=FAILED_CLOSE, ts=Mon May 07 17:02:13 CST 2018 (13107s ago), server=null
  KYLIN_80TYFBXYHA,,1525660830618.2aafa56d517997e0aee45a516ac990f1. state=FAILED_CLOSE, ts=Mon May 07 17:02:14 CST 2018 (13106s ago), server=null
  KYLIN_875PS7V2PN,,1524715341451.c96d67c4643b819513312fc83a7549a4. state=FAILED_CLOSE, ts=Mon May 07 17:02:13 CST 2018 (13107s ago), server=null
  KYLIN_8SV2D5XCMB,,1525657404680.fabe589ab2df00f73c4ba370ceda0a15. state=FAILED_CLOSE, ts=Mon May 07 17:02:14 CST 2018 (13106s ago), server=null
2018-05-07 20:40:41,600 WARN  [hbasefsck-pool1-t5] util.HBaseFsck: No HDFS region dir found: { meta => kylin_metadata,,1524037727389.e770d1e6528b6154f29a5314e6c8733d., hdfs => null, deployed => , replicaId => 0 } meta={ENCODED => e770d1e6528b6154f29a5314e6c8733d, NAME => 'kylin_metadata,,1524037727389.e770d1e6528b6154f29a5314e6c8733d.', STARTKEY => '', ENDKEY => ''}
2018-05-07 20:40:41,842 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
ERROR: Region { meta => kylin_metadata,,1524037727389.e770d1e6528b6154f29a5314e6c8733d., hdfs => null, deployed => , replicaId => 0 } found in META, but not in HDFS or deployed on any region server.


kylin_metadata_DETAILS.log 里面的Number of regions in transition: 11 这些还在分配过程(RIT)中的区域,其实已经作废了的,我不久前就手动删除了,不想要他们了,但是没删干净。
在web UI也查看到Regions in Transition栏目下的表 kylin_metadata 赫然在列,而且状态state=FAILED_CLOSE,RIT time(ms) 持续时间已经8924398 即持续了2.47个小时!


【解决】清除掉zk节点中处于RIT状态的表
[zk: localhost:2181(CONNECTED) 0] ls /hbase-unsecure/region-in-transition
[b0eaa0d6f000ebaa2090703d3e53ed89, 40353cc56028c2bf85fad6a01626ddb3, 2aafa56d517997e0aee45a516ac990f1, 367e62a2489d180428f6c23f0092bdf6, e770d1e6528b6154f29a5314e6c8733d, 1a17bb4d1f47ef6f146655b9a4a5bbed, 8f50d43fff533da39e6b6bce8e2536b7, 05c0ffe0e9f64b117e4ab40171782cae, c96d67c4643b819513312fc83a7549a4, fabe589ab2df00f73c4ba370ceda0a15, da8073440b414fc140cb7f5fa0341ae5]
[zk: localhost:2181(CONNECTED) 1] rmr /hbase-unsecure/region-in-transition/b0eaa0d6f000ebaa2090703d3e53ed89
[zk: localhost:2181(CONNECTED) 2] rmr /hbase-unsecure/region-in-transition/40353cc56028c2bf85fad6a01626ddb3
[zk: localhost:2181(CONNECTED) 3] rmr /hbase-unsecure/region-in-transition/2aafa56d517997e0aee45a516ac990f1
[zk: localhost:2181(CONNECTED) 4] rmr /hbase-unsecure/region-in-transition/367e62a2489d180428f6c23f0092bdf6
[zk: localhost:2181(CONNECTED) 5] rmr /hbase-unsecure/region-in-transition/e770d1e6528b6154f29a5314e6c8733d
[zk: localhost:2181(CONNECTED) 6] rmr /hbase-unsecure/region-in-transition/1a17bb4d1f47ef6f146655b9a4a5bbed
[zk: localhost:2181(CONNECTED) 7] rmr /hbase-unsecure/region-in-transition/8f50d43fff533da39e6b6bce8e2536b7
[zk: localhost:2181(CONNECTED) 8] rmr /hbase-unsecure/region-in-transition/05c0ffe0e9f64b117e4ab40171782cae
[zk: localhost:2181(CONNECTED) 9] rmr /hbase-unsecure/region-in-transition/c96d67c4643b819513312fc83a7549a4
[zk: localhost:2181(CONNECTED) 10] rmr /hbase-unsecure/region-in-transition/fabe589ab2df00f73c4ba370ceda0a15
[zk: localhost:2181(CONNECTED) 11] rmr /hbase-unsecure/region-in-transition/da8073440b414fc140cb7f5fa0341ae5
[zk: localhost:2181(CONNECTED) 12] ls /hbase-unsecure/region-in-transition
[]


再重启两个hmaster,结果:仍然在hbase的web UI中看到 regions in transition的那些表,那些表都是我之前删掉了的。怀疑是元数据还在。
进入hbase shell中检查刚才被删掉的表是否还在 habse:meta 表(元数据表)中有记录,结果:果然在元数据表中还在。
那么解决办法是,把 hbase:meta 表中包含那些处于RIT状态表的记录逐条删除即可。

hbase(main):015:0> scan "hbase:meta", {STARTROW => 'KYLIN_JVKSFWWFT7', LIMIT => 3}
ROW                                                          COLUMN+CELL
 KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2 column=info:regioninfo, timestamp=1524046156037, value={ENCODED => 8f50d43fff533da39e6b6bce8e2536b7, NAME => 'KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2536b7.
 536b7.', STARTKEY => '', ENDKEY => ''}
 KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2 column=info:seqnumDuringOpen, timestamp=1525663825507, value=\x00\x00\x00\x00\x00\x00\x00\x1E
 536b7.       
 KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2 column=info:server, timestamp=1525663825507, value=kmr-a20125dd-gn-a05044c6-core-1-008.ksc.com:16020
 536b7.       
 KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2 column=info:serverstartcode, timestamp=1525663825507, value=1525662768456
 536b7.
 KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a column=info:regioninfo, timestamp=1525663588877, value={ENCODED => 1a17bb4d1f47ef6f146655b9a4a5bbed, NAME => 'KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a5bbed.
 5bbed.', STARTKEY => '', ENDKEY => ''}
 KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a column=info:seqnumDuringOpen, timestamp=1525663827974, value=\x00\x00\x00\x00\x00\x00\x00\x04
 5bbed.
 KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a column=info:server, timestamp=1525663827974, value=kmr-a20125dd-gn-a05044c6-core-1-008.ksc.com:16020
 5bbed.
 KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a column=info:serverstartcode, timestamp=1525663827974, value=1525662768456
 5bbed.
 KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e5 column=info:regioninfo, timestamp=1524038586204, value={ENCODED => b0eaa0d6f000ebaa2090703d3e53ed89, NAME => 'KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e53ed89.
 3ed89.', STARTKEY => '', ENDKEY => ''}
 KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e5 column=info:seqnumDuringOpen, timestamp=1525663680965, value=\x00\x00\x00\x00\x00\x00\x00$
 3ed89.
 KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e5 column=info:server, timestamp=1525663680965, value=kmr-a20125dd-gn-a05044c6-core-1-008.ksc.com:16020  
 3ed89.
 KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e5 column=info:serverstartcode, timestamp=1525663680965, value=1525662768456                                                                                                      
 3ed89.
3 row(s) in 0.0200 seconds

hbase(main):024:0> delete 'hbase:meta','KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2536b7.','info:regioninfo'
hbase(main):032:0> delete 'hbase:meta','KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2536b7.','info:seqnumDuringOpen'
hbase(main):034:0> delete 'hbase:meta','KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2536b7.','info:server'
hbase(main):035:0> delete 'hbase:meta','KYLIN_JVKSFWWFT7,,1524046155657.8f50d43fff533da39e6b6bce8e2536b7.','info:serverstartcode'
hbase(main):040:0> delete 'hbase:meta','KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a5bbed.','info:regioninfo'
hbase(main):041:0> delete 'hbase:meta','KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a5bbed.','info:seqnumDuringOpen'
hbase(main):042:0> delete 'hbase:meta','KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a5bbed.','info:server'
hbase(main):043:0> delete 'hbase:meta','KYLIN_LJW3WT8H3O,,1525663588151.1a17bb4d1f47ef6f146655b9a4a5bbed.','info:serverstartcode'
hbase(main):048:0> delete 'hbase:meta','KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e53ed89.','info:regioninfo'
hbase(main):049:0> delete 'hbase:meta','KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e53ed89.','info:seqnumDuringOpen'
hbase(main):050:0> delete 'hbase:meta','KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e53ed89.','info:server'
hbase(main):051:0> delete 'hbase:meta','KYLIN_VYVN08HI3O,,1524038585742.b0eaa0d6f000ebaa2090703d3e53ed89.','info:serverstartcode'


# 再次到zk相关节点,删除掉 region-in-transition 下面的所有节点
[zk: localhost:2181(CONNECTED) 0] ls /hbase-unsecure/region-in-transition
[b0eaa0d6f000ebaa2090703d3e53ed89, 40353cc56028c2bf85fad6a01626ddb3, 2aafa56d517997e0aee45a516ac990f1, 367e62a2489d180428f6c23f0092bdf6, e770d1e6528b6154f29a5314e6c8733d, 1a17bb4d1f47ef6f146655b9a4a5bbed, 8f50d43fff533da39e6b6bce8e2536b7, 05c0ffe0e9f64b117e4ab40171782cae, c96d67c4643b819513312fc83a7549a4, fabe589ab2df00f73c4ba370ceda0a15, da8073440b414fc140cb7f5fa0341ae5]
[zk: localhost:2181(CONNECTED) 1] rmr /hbase-unsecure/region-in-transition/b0eaa0d6f000ebaa2090703d3e53ed89
[zk: localhost:2181(CONNECTED) 2] rmr /hbase-unsecure/region-in-transition/40353cc56028c2bf85fad6a01626ddb3
[zk: localhost:2181(CONNECTED) 3] rmr /hbase-unsecure/region-in-transition/2aafa56d517997e0aee45a516ac990f1
[zk: localhost:2181(CONNECTED) 4] rmr /hbase-unsecure/region-in-transition/367e62a2489d180428f6c23f0092bdf6
[zk: localhost:2181(CONNECTED) 5] rmr /hbase-unsecure/region-in-transition/e770d1e6528b6154f29a5314e6c8733d
[zk: localhost:2181(CONNECTED) 6] rmr /hbase-unsecure/region-in-transition/1a17bb4d1f47ef6f146655b9a4a5bbed
[zk: localhost:2181(CONNECTED) 7] rmr /hbase-unsecure/region-in-transition/8f50d43fff533da39e6b6bce8e2536b7
[zk: localhost:2181(CONNECTED) 8] rmr /hbase-unsecure/region-in-transition/05c0ffe0e9f64b117e4ab40171782cae
[zk: localhost:2181(CONNECTED) 9] rmr /hbase-unsecure/region-in-transition/c96d67c4643b819513312fc83a7549a4
[zk: localhost:2181(CONNECTED) 10] rmr /hbase-unsecure/region-in-transition/fabe589ab2df00f73c4ba370ceda0a15
[zk: localhost:2181(CONNECTED) 11] rmr /hbase-unsecure/region-in-transition/da8073440b414fc140cb7f5fa0341ae5
[zk: localhost:2181(CONNECTED) 12] ls /hbase-unsecure/region-in-transition
[]


再重启两个 HMaster,结果:在hbase的web UI中不再有regions in transition. 至此,刚才已删除的表在元数据表hbase:meta中就算清理干净了。

猜你喜欢

转载自blog.csdn.net/qq_31598113/article/details/80258767