Hadoop source code reading (2): DataNode startup

Instructions:
1. Hadoop version: 3.1.3
2. Reading tool: IDEA 2023.1.2
3. Source code acquisition: Index of /dist/hadoop/core/hadoop-3.1.3 (apache.org)
4. Project import: Download source code After that, get hadoop-3.1.3-src.tar.gzthe compressed package, open PowerShell in the current directory, use tar -zxvfthe command to decompress, and then use IDEA to open hadoop-3.1.3-srcthe folder. Be careful to configure the Maven or Gradle warehouse, otherwise the jar package import will be slow.
5. Reference course: Shang Silicon Valley Big Data Hadoop Tutorial, hadoop3.x set up to cluster tuning, millions of plays_bilibili_bilibili

ctrl + n Search datanode globally and enter DataNode.java

The official introduction is as follows:

/**********************************************************
 * DataNode is a class (and program) that stores a set of
 * blocks for a DFS deployment.  A single deployment can
 * have one or many DataNodes.  Each DataNode communicates
 * regularly with a single NameNode.  It also communicates
 * with client code and other DataNodes from time to time.
 *
 * DataNodes store a series of named blocks.  The DataNode
 * allows client code to read these blocks, or to write new
 * block data.  The DataNode may also, in response to instructions
 * from its NameNode, delete blocks or copy blocks to/from other
 * DataNodes.
 *
 * The DataNode maintains just one critical table:
 *   block-> stream of bytes (of BLOCK_SIZE or less)
 *
 * This info is stored on a local disk.  The DataNode
 * reports the table's contents to the NameNode upon startup
 * and every so often afterwards.
 *
 * DataNodes spend their lives in an endless loop of asking
 * the NameNode for something to do.  A NameNode cannot connect
 * to a DataNode directly; a NameNode simply returns values from
 * functions invoked by a DataNode.
 *
 * DataNodes maintain an open server socket so that client code 
 * or other DataNodes can read/write data.  The host/port for
 * this server is reported to the NameNode, which then sends that
 * information to clients or other DataNodes that might be interested.
 *
 **********************************************************/

Find the main method

Entry secureMainmethod:

Entry createDataNodemethod:

  • Entry instantiateDataNodemethod (initialize DN):

Entering makeInstancemethod (instantiated object):

Enter DataNodeclass:

Enter startDataNodethe method, and then perform a series of operations for DN startup;

1.Initialize DataXceiverServer (initDataXceiver)

In startDataNodethe method:

Entry initDataXceivermethod:

dataXceiverServer is a service (thread) that DN uses to receive data services sent by clients and other DNs.

2. Initialize HTTP service (startInfoServer)

In startDataNodethe method:

Enter startInfoServerthe method and instantiate an httpserver

Enter DatanodeHttpServerthe class: (DatanodeHttpServer.java)

An HTTP server is also built through builder

3. Initialize the RPC server (initIpcServer)

In startDataNodethe method:

Enter initIpcServerand build the RPC server

4. Register with NameNode (refreshNamenodes)

In startDataNodethe method:

Enter refreshNamenodes(BlockPoolManager.java)

Entry doRefreshNamenodesmethod:

First create BPOS, then start all BPOS

  • Next enter createBPOSthe method

You can see that the corresponding services are created based on the number of NameNodes:

Create as many corresponding services as there are NNs

  • Entry startAllmethod

It can be seen that all bpos are traversed and started in sequence.

Here offerServicesis a collection that stores all createBPOSbpos created by the method:

Entry bpos.start()method:

Continue to enter:

Seeing it bpThreadmeans starting a thread, so look for its runmethod

You can see how connectToNNAndHandshaketo register with NN through the method:

  • connectToNNGet the RPC client object of NN through the method and enter the method :

Enter DatanodeProtocolClientSideTranslatorPB:

Use createNamenodethe method to create an RPC proxy for NN:

  • registerRegister with NN through the method:

By bpNamenode.registerDatanodesending registration information to NN (bpNamenode is the created NN RPC proxy)

Note: The registerDatanode method here is called by DN, but executed in NN

Go FSNamesystemand find:

ctrl+alt+h to view the call of this method:

So I went NameNodeRpcServerto look for:

It can be seen that the step of registering DN information with NN is completed in this method;

Next, go back to FSNamesystemthe middle and enter blockManager.registerDatanodethe method:

Enter datanodeManager.registerDatanode: (DatanodeManager.java)

EnteraddDatanode

5. Send heartbeat to NameNode

Go back to startAllthe method, and then go forward to the method BPServiceActor.javainrun

find the offerServiceway

Entry offerServicemethod:

Entry sendHeartBeatmethod:

Here bpNamenodeis connectToNNAndHandshakethe agent of NN obtained in the method

Therefore, this method actually sends the heartbeat information to NN through NN's RPC client.

Therefore the actual implementation of this method is in NN, so NameNodeRpcServer.javathe search is in sendHeartbeat:

To handleHeartbeatprocess the heartbeat information of DN, enter this method:

Process the heartbeat sent by the DN and respond accordingly;

Continue to enter handleHeartbeatthe method: (DatanodeManager.java)

To update heartbeat information through updateHeartbeatmethods, enter this method: (HeartbeatManager.java)

Enter again: (BlockManager.java)

Continue to enter: (DatanodeDescriptor.java)

Entry updateHeartbeatStatemethod:

 

 

Guess you like

Origin blog.csdn.net/qq_51235856/article/details/132921987