Instructions:
1. Hadoop version: 3.1.3
2. Reading tool: IDEA 2023.1.2
3. Source code acquisition: Index of /dist/hadoop/core/hadoop-3.1.3 (apache.org)
4. Project import: Download source code After that, gethadoop-3.1.3-src.tar.gz
the compressed package, open PowerShell in the current directory, usetar -zxvf
the command to decompress, and then use IDEA to openhadoop-3.1.3-src
the folder. Be careful to configure the Maven or Gradle warehouse, otherwise the jar package import will be slow.
5. Reference course: Shang Silicon Valley Big Data Hadoop Tutorial, hadoop3.x set up to cluster tuning, millions of plays_bilibili_bilibili
ctrl + n Search datanode globally and enter DataNode.java
The official introduction is as follows:
/**********************************************************
* DataNode is a class (and program) that stores a set of
* blocks for a DFS deployment. A single deployment can
* have one or many DataNodes. Each DataNode communicates
* regularly with a single NameNode. It also communicates
* with client code and other DataNodes from time to time.
*
* DataNodes store a series of named blocks. The DataNode
* allows client code to read these blocks, or to write new
* block data. The DataNode may also, in response to instructions
* from its NameNode, delete blocks or copy blocks to/from other
* DataNodes.
*
* The DataNode maintains just one critical table:
* block-> stream of bytes (of BLOCK_SIZE or less)
*
* This info is stored on a local disk. The DataNode
* reports the table's contents to the NameNode upon startup
* and every so often afterwards.
*
* DataNodes spend their lives in an endless loop of asking
* the NameNode for something to do. A NameNode cannot connect
* to a DataNode directly; a NameNode simply returns values from
* functions invoked by a DataNode.
*
* DataNodes maintain an open server socket so that client code
* or other DataNodes can read/write data. The host/port for
* this server is reported to the NameNode, which then sends that
* information to clients or other DataNodes that might be interested.
*
**********************************************************/
Find the main method
Entry secureMain
method:
Entry createDataNode
method:
- Entry
instantiateDataNode
method (initialize DN):
Entering makeInstance
method (instantiated object):
Enter DataNode
class:
Enter startDataNode
the method, and then perform a series of operations for DN startup;
1.Initialize DataXceiverServer (initDataXceiver)
In startDataNode
the method:
Entry initDataXceiver
method:
dataXceiverServer is a service (thread) that DN uses to receive data services sent by clients and other DNs.
2. Initialize HTTP service (startInfoServer)
In startDataNode
the method:
Enter startInfoServer
the method and instantiate an httpserver
Enter DatanodeHttpServer
the class: (DatanodeHttpServer.java)
An HTTP server is also built through builder
3. Initialize the RPC server (initIpcServer)
In startDataNode
the method:
Enter initIpcServer
and build the RPC server
4. Register with NameNode (refreshNamenodes)
In startDataNode
the method:
Enter refreshNamenodes
(BlockPoolManager.java)
Entry doRefreshNamenodes
method:
First create BPOS, then start all BPOS
- Next enter
createBPOS
the method
You can see that the corresponding services are created based on the number of NameNodes:
Create as many corresponding services as there are NNs
- Entry
startAll
method
It can be seen that all bpos are traversed and started in sequence.
Here offerServices
is a collection that stores all createBPOS
bpos created by the method:
Entry bpos.start()
method:
Continue to enter:
Seeing it bpThread
means starting a thread, so look for its run
method
You can see how connectToNNAndHandshake
to register with NN through the method:
connectToNN
Get the RPC client object of NN through the method and enter the method :
Enter DatanodeProtocolClientSideTranslatorPB
:
Use createNamenode
the method to create an RPC proxy for NN:
register
Register with NN through the method:
By bpNamenode.registerDatanode
sending registration information to NN (bpNamenode is the created NN RPC proxy)
Note: The registerDatanode method here is called by DN, but executed in NN
Go FSNamesystem
and find:
ctrl+alt+h to view the call of this method:
So I went NameNodeRpcServer
to look for:
It can be seen that the step of registering DN information with NN is completed in this method;
Next, go back to FSNamesystem
the middle and enter blockManager.registerDatanode
the method:
Enter datanodeManager.registerDatanode
: (DatanodeManager.java)
EnteraddDatanode
5. Send heartbeat to NameNode
Go back to startAll
the method, and then go forward to the method BPServiceActor.java
inrun
find the offerService
way
Entry offerService
method:
Entry sendHeartBeat
method:
Here bpNamenode
is connectToNNAndHandshake
the agent of NN obtained in the method
Therefore, this method actually sends the heartbeat information to NN through NN's RPC client.
Therefore the actual implementation of this method is in NN, so NameNodeRpcServer.java
the search is in sendHeartbeat
:
To handleHeartbeat
process the heartbeat information of DN, enter this method:
Process the heartbeat sent by the DN and respond accordingly;
Continue to enter handleHeartbeat
the method: (DatanodeManager.java)
To update heartbeat information through updateHeartbeat
methods, enter this method: (HeartbeatManager.java)
Enter again: (BlockManager.java)
Continue to enter: (DatanodeDescriptor.java)
Entry updateHeartbeatState
method: