hadoop cluster read and write data flow

 

 1) the client through the Distributed FileSystem request to NameNode download files, NameNode by querying metadata, to find the file DataNode address block is located.

2) selection of a DataNodes (principle of proximity, then randomly) server, a request to read data .

3) DataNode begin transmitting data to the client (read data from a disk inside the input stream to P acket units do check).

4) Client to P acket receiving units, first in the local cache, and then written to the destination file.

1) the client through the Distributed FileSystem module to NameNode request upload files, NameNode check whether the target file already exists, the parent directory exists.

2) the NameNode return if it can be uploaded.

3) a first client request B Lock upload to several D ATA N on ode server.

4)NameNode返回3个DataNode节点,分别为dn1dn2dn3

5)客户端通过FSDataOutputStream模块请求dn1上传数据,dn1收到请求会继续调用dn2,然后dn2调用dn3,将这个通信管道建立完成。

6)dn1dn2dn3逐级应答客户端。

7)客户端开始往dn1上传第一个Block(先从磁盘读取数据放到一个本地内存缓存),以Packet为单位,dn1收到一个Packet就会传给dn2dn2传给dn3dn1每传一个packet会放入一个应答队列等待应答

8)当一个Block传输完成之后,客户端再次请求NameNode上传第二个Block的服务器。(重复执行3-7步)

 

发布了242 篇原创文章 · 获赞 13 · 访问量 1万+

Guess you like

Origin blog.csdn.net/qq_41813208/article/details/102714539