hbase版本:1.1.2 hadoop版本:2.7.3
Hbase在hdfs上的目录/apps/hbase/data/archive占用空间过大,导致不停地发出hdfs空间使用率告警。
【问题】
告警信息 alert: datanode_storage is triggered 告警信息表明某个或某些data node 的HDFS存储空间使用率已超过阈值(我们设置的是80%),需要清理。
[hdfs@master-2 root]$ hdfs dfs -du -h /apps/hbase/data/archive/data/
19.1 M /apps/hbase/data/archive/data/default
12.6 T /apps/hbase/data/archive/data/good_namespace # 此目录占用过多空间
[hdfs@master-2 root]$ hdfs dfs -du -h /apps/hbase/data/archive/data/default
19.1 M /apps/hbase/data/archive/data/default/KYLIN_YEDCQ82BF3
[hdfs@master-2 root]$ hdfs dfs -du -h /apps/hbase/data/archive/data/good_namespace
4.8 M /apps/hbase/data/archive/data/good_namespace/url_history_30
1.3 M /apps/hbase/data/archive/data/good_namespace/url_history_7
5.8 M /apps/hbase/data/archive/data/good_namespace/user_statistic
12.6 T /apps/hbase/data/archive/data/good_namespace/users # 是这张表占用了过多空间
90.8 M /apps/hbase/data/archive/data/good_namespace/weekly_stat
23.5 G /apps/hbase/data/archive/data/good_namespace/android_active_user_info
【分析】
查遍了HDFS集群上所有可能发生数据臃肿的地方,例如oldWALs、.Trash,并清理了相关的文件,效果甚微。
后来想到会不会是因为备份数据的归档导致的空间占用,于是去查看hbase相关的表做的快照snapshot,果然有一张大表的快照。
【解决】
清理占用空间最多的表快照,只保留最新的,删除旧的
# 查询发现/apps/hbase/data/archive/data目录下的每个子目录分别对应着hbase表的快照
hbase(main):002:0> list_snapshots
SNAPSHOT TABLE + CREATION TIME
KYLIN_YEDCQ82BF3_snapshot_20180315 KYLIN_YEDCQ82BF3 (Thu Mar 15 18:19:56 +0800 2018)
url_history_30-snapshot_20180313 good_namespace:url_history_30 (Tue Mar 13 16:41:54 +0800 2018)
url_history_7-snapshot_20180313 good_namespace:url_history_7 (Tue Mar 13 17:07:04 +0800 2018)
user_statistic_Snapshot_20180209 good_namespace:user_statistic (Fri Feb 09 18:01:21 +0800 2018)
user_statistic_snapshot_20180313 good_namespace:user_statistic (Tue Mar 13 16:36:06 +0800 2018)
users_snapshot_20180209 good_namespace:users (Fri Feb 09 17:26:51 +0800 2018)
users_snapshot_20180313 good_namespace:users (Tue Mar 13 15:39:20 +0800 2018)
users_snapshot_20180408 good_namespace:users (Sun Apr 08 16:16:32 +0800 2018)
weekly_stat_snapshot_20180313 good_namespace:weekly_stat (Tue Mar 13 16:17:33 +0800 2018)
android_active_user_info_Snapshot_20180212 good_namespace:android_active_user_info (Mon Feb 12 11:35:33 +0800 2018)
10 row(s) in 0.2800 seconds
=> ["KYLIN_YEDCQ82BF3_snapshot_20180315", "url_history_30-snapshot_20180313", "url_history_7-snapshot_20180313", "user_statistic_Snapshot_20180209", "user_statistic_snapshot_20180313", "users_snapshot_20180209", "users_snapshot_20180313", "users_snapshot_20180408", "weekly_stat_snapshot_20180313", "android_active_user_info_Snapshot_20180212"]
hbase(main):003:0>
# 先删除旧的快照,再创建最新的快照,只保留最新的快照
hbase(main):002:0> list_snapshots
SNAPSHOT TABLE + CREATION TIME
KYLIN_YEDCQ82BF3_snapshot_20180315 KYLIN_YEDCQ82BF3 (Thu Mar 15 18:19:56 +0800 2018) url_history_30_snapshot_20180412 good_namespace:url_history_30 (Thu Apr 12 15:05:08 +0800 2018) url_history_7-snapshot_20180412 good_namespace:url_history_7 (Thu Apr 12 15:06:10 +0800 2018)
user_statistic_snapshot_20180412 good_namespace:user_statistic (Thu Apr 12 14:56:33 +0800 2018) users_snapshot_20180412 good_namespace:users (Thu Apr 12 14:59:40 +0800 2018) weekly_stat_snapshot_20180412 good_namespace:weekly_stat (Thu Apr 12 15:03:20 +0800 2018)
android_active_user_info_20180412 good_namespace:android_active_user_info (Thu Apr 12 15:02:15 +0800 2018) android_active_user_info_Snapshot_20180212 good_namespace:android_active_user_info (Mon Feb 12 11:35:33 +0800 2018) 8 row(s) in 0.1540 seconds
=> ["KYLIN_YEDCQ82BF3_snapshot_20180315", "url_history_30_snapshot_20180412", "url_history_7-snapshot_20180412", "user_statistic_snapshot_20180412", "users_snapshot_20180412", "weekly_stat_snapshot_20180412", "android_active_user_info_20180412", "android_active_user_info_Snapshot_20180212"]
# 再次查看hdfs该目录下的空间占用情况:成功释放掉快照相关文件占用的空间
[hdfs@master-2 root]$ hdfs dfs -du -h /apps/hbase/data/archive/data/
19.1 M /apps/hbase/data/archive/data/default
119.5 G /apps/hbase/data/archive/data/good_namespace # 此目录占用空间从12.6Tb下降到119.5Gb