文章目录
1. 建入口类断点调试
我们断点调试hdfs读数据过程,为了方便调试,写一个测试类:
public class FsShellTest {
public static void main(String argv[]) throws Exception {
FsShell shell = new FsShell();
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://hadoop1:9000");
conf.setQuietMode(false);
shell.setConf(conf);
String[] args = {"-text","/user/hello.txt"}; // 普通目录
int res;
try {
res = ToolRunner.run(shell, args);
} finally {
shell.close();
}
System.exit(res);
}
}
测试类中直接调用ToolRunner.run
,后续过程同FsShell
类调用过程。
我们先看一下 hadoop fs -text
的调用栈:
"main@1" prio=5 tid=0x1 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:353)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:837)
at org.apache.hadoop.fs.shell.Display$Cat.getInputStream(Display.java:114)
at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:131)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:291)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:273)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:257)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:203)
at org.apache.hadoop.fs.shell.Command.run(Command.java:167)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at cn.whbing.hadoop.FsShellTest.main(FsShellTest.java:25)
我们看到处理流程:
ToolRunner.run --> FsShell.run --> Command.processPaths --> Display$Cat.processPath
至此参数解析完毕,通过传入的参数解析到为 read
操作。此后就是open等操作,可以看做 read
操作的开始,分析如下。
附:
对于读的断点调试,我们还可以直接在客户端调用open方法来测试,如:
/**
* 测试2:直接调用FileSystem的open方法来读
*/
@Test
public void testRead(){
try {
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://hadoop1:9000");
FileSystem fs = FileSystem.get(conf);
Path path = new Path("/user/hello.txt");
FSDataInputStream in = fs.open(path);
BufferedReader buff = new BufferedReader(new InputStreamReader(in));
String str = null;
while ((str = buff.readLine()) != null){
System.out.println(str);
}
buff.close();
in.close();
} catch (Exception e){
}
}
2. 读操作分析
1. 打开HDFS文件
对于 FSDataInputStream in = fs.open(path);
断点调试,其调用栈为:
"main@1" prio=5 tid=0x1 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:353)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:837)
at cn.whbing.hadoop.FsShellTest.testRead(FsShellTest.java:57)
调用了DistributedFileSystem.open
,
1. DFSInputStream.open()
DFSInputStream.java
:
DFSInputStream(DFSClient dfsClient, String src, boolean verifyChecksum,
LocatedBlocks locatedBlocks) throws IOException {
this.dfsClient = dfsClient;
this.verifyChecksum = verifyChecksum;
this.src = src;
synchronized (infoLock) {
this.cachingStrategy = dfsClient.getDefaultReadCachingStrategy();
}
this.locatedBlocks = locatedBlocks;
openInfo(false); //由info改成了info(false)
}
在构造器中,openInfo多了boolean类型的参数,openInfo方法如下:
void openInfo(boolean refreshLocatedBlocks) throws IOException {
final DFSClient.Conf conf = dfsClient.getConf();
synchronized(infoLock) {
lastBlockBeingWrittenLength =
fetchLocatedBlocksAndGetLastBlockLength(refreshLocatedBlocks);
int retriesForLastBlockLength = conf.retryTimesForGetLastBlockLength;
while (retriesForLastBlockLength > 0) {
// Getting last block length as -1 is a special case. When cluster
// restarts, DNs may not report immediately. At this time partial block
// locations will not be available with NN for getting the length. Lets
// retry for 3 times to get the length.
if (lastBlockBeingWrittenLength == -1) {
DFSClient.LOG.warn("Last block locations not available. "
+ "Datanodes might not have reported blocks completely."
+ " Will retry for " + retriesForLastBlockLength + " times");
waitFor(conf.retryIntervalForGetLastBlockLength);
lastBlockBeingWrittenLength =
fetchLocatedBlocksAndGetLastBlockLength(true);
} else {
break;
}
retriesForLastBlockLength--;
}
if (lastBlockBeingWrittenLength == -1
&& retriesForLastBlockLength == 0) {
throw new IOException("Could not obtain the last block locations.");
}
}
}
其中 fetchLocatedBlocksAndGetLastBlockLength
方法仅仅加了一个 refresh
布尔参数。
private long fetchLocatedBlocksAndGetLastBlockLength(boolean refresh)
throws IOException {
LocatedBlocks newInfo = locatedBlocks;
if (locatedBlocks == null || refresh) {
newInfo = dfsClient.getLocatedBlocks(src, 0);
}
...
protected InputStream getInputStream(PathData item) throws IOException {
FSDataInputStream i = (FSDataInputStream)super.getInputStream(item);
// Handle 0 and 1-byte files
short leadBytes;
try {
leadBytes = i.readShort();
}
...
public synchronized int read() throws IOException {
if (oneByteBuf == null) {
oneByteBuf = new byte[1];
}
int ret = read( oneByteBuf, 0, 1 );
return ( ret <= 0 ) ? -1 : (oneByteBuf[0] & 0xff);
}
public synchronized int read(final byte buf[], int off, int len) throws IOException {
ReaderStrategy byteArrayReader = new ByteArrayStrategy(buf);
TraceScope scope =
dfsClient.getPathTraceScope("DFSInputStream#byteArrayRead", src);
try {
return readWithStrategy(byteArrayReader, off, len);
} finally {
scope.close();
}
}
其中read方法调用 readWithStrategy
,
private synchronized int readWithStrategy(ReaderStrategy strategy, int off, int len) throws IOException {
dfsClient.checkOpen();
if (closed.get()) {
throw new IOException("Stream closed");
}
Map<ExtendedBlock,Set<DatanodeInfo>> corruptedBlockMap
= new HashMap<ExtendedBlock, Set<DatanodeInfo>>();
failures = 0;
if (pos < getFileLength()) {
int retries = 2;
while (retries > 0) {
try {
// currentNode can be left as null if previous read had a checksum
// error on the same block. See HDFS-3067
if (pos > blockEnd || currentNode == null) {
currentNode = blockSeekTo(pos);
}
int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
synchronized(infoLock) {
if (locatedBlocks.isLastBlockComplete()) {
realLen = (int) Math.min(realLen,
locatedBlocks.getFileLength() - pos);
}
}
int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
if (result >= 0) {
pos += result;
} else {
// got a EOS from reader though we expect more data on it.
throw new IOException("Unexpected EOS from the reader");
}
if (dfsClient.stats != null) {
dfsClient.stats.incrementBytesRead(result);
}
return result;
} catch (ChecksumException ce) {
throw ce;
} catch (IOException e) {
if (retries == 1) {
DFSClient.LOG.warn("DFS Read", e);
}
blockEnd = -1;
if (currentNode != null) { addToDeadNodes(currentNode); }
if (--retries == 0) {
throw e;
}
} finally {
// Check if need to report block replicas corruption either read
// was successful or ChecksumException occured.
reportCheckSumFailure(corruptedBlockMap,
currentLocatedBlock.getLocations().length);
}
}
}
return -1;
}
private synchronized DatanodeInfo blockSeekTo(long target)
方法,
private DNAddrPair chooseDataNode(LocatedBlock block,
Collection<DatanodeInfo> ignoredNodes) throws IOException {
while (true) {
try {
return getBestNodeDNAddrPair(block, ignoredNodes);
} catch (IOException ie) {
String errMsg = getBestNodeDNAddrPairErrorString(block.getLocations(),
deadNodes, ignoredNodes);
String blockInfo = block.getBlock() + " file=" + src;
if (failures >= dfsClient.getMaxBlockAcquireFailures()) {
String description = "Could not obtain block: " + blockInfo;
DFSClient.LOG.warn(description + errMsg
+ ". Throwing a BlockMissingException");
throw new BlockMissingException(src, description,
block.getStartOffset());
}
getBestNodeDNAddrPair 的方法如下:
Q:
这个true 出不来怎么办?
private DNAddrPair chooseDataNode(LocatedBlock block,
Collection ignoredNodes) throws IOException {
while (true) {
try {
根据错误提示:
hadoop fs -text hdfs://hadoop1:9000/ec/t.log
19/08/14 17:24:08 WARN hdfs.DFSClient: Failed to connect to /10.179.17.22:9866 for block, add to deadNodes and continue. java.io.IOException: Got error, status message opReadBlock BP-1712821023-10.179.25.59-1564737285129:blk_-9223372036854775648_1019 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1712821023-10.179.25.59-1564737285129:blk_-9223372036854775648_1019, for OP_READ_BLOCK, self=/10.179.72.122:47858, remote=/10.179.17.22:9866, for file /ec/t.log, for pool BP-1712821023-10.179.25.59-1564737285129 block -9223372036854775648_1019
java.io.IOException: Got error, status message opReadBlock BP-1712821023-10.179.25.59-1564737285129:blk_-9223372036854775648_1019 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1712821023-10.179.25.59-1564737285129:blk_-9223372036854775648_1019, for OP_READ_BLOCK, self=/10.179.72.122:47858, remote=/10.179.17.22:9866, for file /ec/t.log, for pool BP-1712821023-10.179.25.59-1564737285129 block -9223372036854775648_1019
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:142)
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:679)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:888)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:940)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:741)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:136)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
对于 BlockReader
:
public interface BlockReader extends ByteBufferReadable {
int read(byte[] buf, int off, int len) throws IOException;
long skip(long n) throws IOException;
...
}
getRemoteBlockReaderFromTcp
private BlockReader getRemoteBlockReader(Peer peer) throws IOException {
if (conf.useLegacyBlockReader) {
return RemoteBlockReader.newBlockReader(fileName,
block, token, startOffset, length, conf.ioBufferSize,
verifyChecksum, clientName, peer, datanode,
clientContext.getPeerCache(), cachingStrategy);
} else {
return RemoteBlockReader2.newBlockReader(
fileName, block, token, startOffset, length,
verifyChecksum, clientName, peer, datanode,
clientContext.getPeerCache(), cachingStrategy);
}
}
readBlock方法中,通过发送值为81的状态码org.apache.hadoop.hdfs.protocol.datatransfer.Op.READ_BLOCK 到DataXceiver中的peer服务。
在这里我们看到把各个参数都set到了OpReadBlockProto对象里,然后发送出去,也就是发送到了初始化的DataXceiverServer服务.
这个时候服务端一直阻塞的socket线程将会收到操作码为81的读请求,然后就进行后续的处理
我们看下,其实其他的一些对于数据的操作,如copyBlock,writeBlock都是在Sender中完成的.
Sender的包是 public class Sender implements DataTransferProtocol
我们来看DataXceiverServer
的run
方法。
public void run() {
Peer peer = null;
while (datanode.shouldRun && !datanode.shutdownForUpgrade) {
try {
//通过accept方法在这里一直阻塞,直到有请求过来,我们通过跟踪代码,看到内部其实是封装了java的serverSocket的accept方法.
peer = peerServer.accept();
// Make sure the xceiver count is not exceeded
int curXceiverCount = datanode.getXceiverCount();
if (curXceiverCount > maxXceiverCount.get()) {
throw new IOException("Xceiver count " + curXceiverCount
+ " exceeds the limit of concurrent xcievers: "
+ maxXceiverCount.get());
}
//当有请求过来的时候,就通过DataXceiver.create创建了一个守护进程,并将其加到线程组里.
new Daemon(datanode.threadGroup,
DataXceiver.create(peer, datanode, this))
.start();
} catch (SocketTimeoutException ignored) {
// wake up to see if should continue to run
}
...
}
通过 op = readOp();获取具体是什么操作,读、写、copy等,然后processOp(op);方法来处理具体的逻辑
在方法中,通过switch来具体的分发,让不同的方法执行不同的逻辑
protected final void processOp(Op op) throws IOException {
switch(op) {
case READ_BLOCK:
opReadBlock();
break;
case WRITE_BLOCK:
opWriteBlock(in);
break;
case REPLACE_BLOCK:
opReplaceBlock(in);
break;
case COPY_BLOCK:
opCopyBlock(in);
break;
case BLOCK_CHECKSUM:
opBlockChecksum(in);
break;
case TRANSFER_BLOCK:
opTransferBlock(in);
break;
case REQUEST_SHORT_CIRCUIT_FDS:
opRequestShortCircuitFds(in);
break;
case RELEASE_SHORT_CIRCUIT_FDS:
opReleaseShortCircuitFds(in);
break;
case REQUEST_SHORT_CIRCUIT_SHM:
opRequestShortCircuitShm(in);
break;
default:
throw new IOException("Unknown op " + op + " in data stream");
}
}
参考:
https://zhangjun5965.iteye.com/blog/2375278