Hadoop操作HDFS的相关命令(python)

Hadoop操作HDFS的相关命令(python)

本文是基于CentOS 7系统环境,搭建Hadoop集群环境,并在主节点上进行测试

  • CentOS 7
  • python 3.6.8
  • hadoop-2.7.1

一、Hadoop相关命令

(1) 查看HDFS的文件结构

hadoop fs -lsr /     

(2) 新建文件夹

hadoop fs -mkdir /test_xz/input

(3) 上传本地文件到HDFS

hadoop fs -put /home/bailang/test.txt /test_xz/input

(4) 下载HDFS中的文件至本地目录

hadoop fs -get /test_xz/input/test.txt /home/bailang/ 

(5) 列出HDFS的某目录

hadoop fs -ls /test_xz  

(6) 查看HDFS上的文件

hadoop fs -cat /test_xz/input/test.txt  

(7) 删除HDFS上的文件

hadoop fs -rm /test_xz/input/test.txt  

(8) 删除HDFS上的目录

hadoop fs -rmr /test_xz/input/

(9) 查看HDFS状态

hadoop dfsadmin -report 

(10) 进入安全模式

hadoop dfsadmin -safemode enter

(11) 离开安全模式

hadoop dfsadmin -safemode leave

二、python操作HDFS

(1) 安装相关的hdfs包

pip install hdfs

(2) 读取hdfs文件内容

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
file_path = "/test_xz/input/test.txt"
lines = []
with client.read(file_path, encoding='utf-8', delimiter='\n') as reader:
	for line in reader:
    lines.append(line.strip())
print(lines)

(3) 在hdfs上创建目录

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input"
client.makedirs(hdfs_path)

(4) 返回hdfs指定目录下的文件

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input"
print(client.list(hdfs_path, status=False))

(5) 移动或者修改文件

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input"
client.rename(source_path, dst_path)

(6) 上传文件到hdfs

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input"
local_path = "/home/bailang/test.txt"
client.upload(hdfs_path, local_path, cleanup=True)

(7) 下载hdfs上的文件到本地

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input/test.txt"
local_path = "/home/bailang"
client.download(hdfs_path, local_path, overwrite=False)

(8) 以追加模式,将数据写入hdfs文件

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input/test.txt"
client.write(hdfs_path, data, overwrite=False, append=True)

(9) 以覆盖模式,将数据写入hdfs文件

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input/test.txt"
client.write(hdfs_path, data, overwrite=True, append=False)

(10) 删除hdfs中的文件

from hdfs.client import Client
client = Client("http://172.30.11.101:50070")
hdfs_path = "/test_xz/input/test.txt"
client.delete(hdfs_path)

发布了68 篇原创文章 · 获赞 98 · 访问量 101万+

猜你喜欢

转载自blog.csdn.net/qq_32599479/article/details/101509612