安装使用Hive的时候发现文件无法从本地拷贝到hadoop集群系统,老是报错,错误内容为:
java.io.IOException: File … could only be replicated to 0 nodes, instead of 1。查找namenode的日志,也是报这个错,而且非常多,即便在启动的时候也是报类似的错误。这个学习环境为单namenode+2个datanode,使用dfsadmin -report报无数据节点,如下:
[hadoop@namenode hadoop]$ hadoop dfsadmin -report Configured Capacity: 0 (0 KB) Present Capacity: 0 (0 KB) DFS Remaining: 0 (0 KB) DFS Used: 0 (0 KB) DFS Used%: ?% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ————————————————- Datanodes available: 0 (0 total, 0 dead) |
重新格式化namenode,并重新启动,故障依旧,清除历史日志,从datanode节点的日志发现了错误“Incompatible namespaceIDs “,删除了datanode的dfs.data.dir目录,并重建,然后重新格式化namenode、启动,发现dfsadmin -report还是没有数据节点加入。再查日志,发现错误不同了:All directories in dfs.data.dir are invalid。没有生成目录结构,正纳闷呢,发现日志中前面有警告:
Invalid directory in dfs.data.dir: Incorrect permissio
n for /hadoop/hadoop-data, expected: rwxr-xr-x, while actual: rwxrwxr-x。
根据警告内容,分别在两个datanode上修改该目录权限:
[hadoop@namenode logs]$ ssh datanode01.hadoop Last login: Wed Mar 14 01:58:39 2012 from namenode.hadoop [hadoop@datanode01 ~]$ chmod g-w /hadoop/hadoop-data/ [hadoop@datanode01 ~]$ exit [hadoop@namenode logs]$ ssh datanode02.hadoop Last login: Wed Mar 14 01:59:00 2012 from datanode01.hadoop [hadoop@datanode02 ~]$ chmod g-w /hadoop/hadoop-data/ [hadoop@datanode02 ~]$ exit |
启动后,故障解除:
[hadoop@namenode hadoop]$ hadoop dfsadmin -report Configured Capacity: 158030774272 (147.18 GB) Present Capacity: 141718949918 (131.99 GB) DFS Remaining: 141718892544 (131.99 GB) DFS Used: 57374 (56.03 KB) DFS Used%: 0% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ————————————————- Datanodes available: 2 (2 total, 0 dead) Name: 172.21.126.102:50010 Decommission Status : Normal Configured Capacity: 79015387136 (73.59 GB) DFS Used: 28687 (28.01 KB) Non DFS Used: 8155709425 (7.6 GB) DFS Remaining: 70859649024(65.99 GB) DFS Used%: 0% DFS Remaining%: 89.68% Last contact: Wed Mar 14 01:40:41 CST 2012 Name: 172.21.126.103:50010 Decommission Status : Normal Configured Capacity: 79015387136 (73.59 GB) DFS Used: 28687 (28.01 KB) Non DFS Used: 8156114929 (7.6 GB) DFS Remaining: 70859243520(65.99 GB) DFS Used%: 0% DFS Remaining%: 89.68% Last contact: Wed Mar 14 01:40:41 CST 2012 |
运行hive测试,也能正常使用了:
hive> create table dummy(value STRING); OK Time taken: 10.642 seconds hive> load data local inpath ‘/tmp/dummy.txt’ overwrite into table dummy; Copying data from file:/tmp/dummy.txt Copying file: file:/tmp/dummy.txt Loading data to table default.dummy Deleted hdfs://namenode.hadoop/user/hive/warehouse/dummy OK Time taken: 0.439 seconds hive> select * from dummy; OK X Time taken: 0.254 seconds hive> |
没想到数据目录的权限检查如此严格,多了用户组的写权限能造成集群系统的无法启动,在这记录,以备查。
转自:http://hellodatabase.com/?p=345