FileChannel的简易类结构:
FileChannel的内部事务类,FileBackedTransaction:
文件操作类:LogFile(LogFileV2在1.7已经被舍弃):
还有其他几个比较重要的类:
FlumeEventQueue,LogFile,Log,LogUtils。
一,初始化过程:public void configure(Context context)
1,useDualCheckpoints(是否需要备份检查点)
2,compressBackupCheckpoint(是否压缩备份节点)
3,checkpointDir(检查点目录,默认在${user.home}目录下)
4,dataDirs(数据节点目录)
5,capacity(获取配置的容量)
6,keepAlive(超时时间,就是如果channel中没有数据最长等待时间)
7,transactionCapacity(事务的最大容量)
注意:capacity的值一定要大于transactionCapacity,不然会报错,看源码:
- Preconditions.checkState(transactionCapacity <= capacity,
- "File Channel transaction capacity cannot be greater than the " +
- "capacity of the channel.");
9,maxFileSize(最大文件的大小,默认是1.5G)
10,minimumRequiredSpace(最少需要多少空间,默认是500M)
11,useLogReplayV1(使用旧重放逻辑)
12,useFastReplay(不使用队列重放)
13,keyProvider(KEY供应商的类型,支持的类型:JCEKSFILE)
14,activeKey(用于加密新数据的密钥名称)
15,cipherProvider(加密提供程序类型,支持的类型:AESCTRNOPADDING)
二,start()方法:
- @Override
- public synchronized void start() {
- LOG.info("Starting {}...", this);
- try {
- Builder builder = new Log.Builder();
- builder.setCheckpointInterval(checkpointInterval);
- builder.setMaxFileSize(maxFileSize);
- builder.setMinimumRequiredSpace(minimumRequiredSpace);
- builder.setQueueSize(capacity);
- builder.setCheckpointDir(checkpointDir);
- builder.setLogDirs(dataDirs);
- builder.setChannelName(getName());
- builder.setUseLogReplayV1(useLogReplayV1);
- builder.setUseFastReplay(useFastReplay);
- builder.setEncryptionKeyProvider(encryptionKeyProvider);
- builder.setEncryptionKeyAlias(encryptionActiveKey);
- builder.setEncryptionCipherProvider(encryptionCipherProvider);
- builder.setUseDualCheckpoints(useDualCheckpoints);
- builder.setCompressBackupCheckpoint(compressBackupCheckpoint);
- builder.setBackupCheckpointDir(backupCheckpointDir);
- builder.setFsyncPerTransaction(fsyncPerTransaction);
- builder.setFsyncInterval(fsyncInterval);
- builder.setCheckpointOnClose(checkpointOnClose);//以上是将configure方法获取到的参数,set到Builder对象
- log = builder.build();
- //builder.build();方法通过Builder创建Log对象
- //并且尝试获取checkpointDir和dataDir文件锁,Log类中的private void lock(File dir) throws IOException方法就是用来尝试过去锁的
- log.replay();
- //1,首先获取到checkpointDir的写锁
- //2,获取最大的fileID
- //3,读取log文件根据record的类型进行相应的操作,进行恢复;遍历所有的data目录
- //4,将queue刷新到相关文件
- open = true;//表示打开channel
- int depth = getDepth();
- Preconditions.checkState(queueRemaining.tryAcquire(depth),
- "Unable to acquire " + depth + " permits " + channelNameDescriptor);
- LOG.info("Queue Size after replay: " + depth + " "
- + channelNameDescriptor);
- } catch (Throwable t) {
- open = false;
- startupError = t;
- LOG.error("Failed to start the file channel " + channelNameDescriptor, t);
- if (t instanceof Error) {
- throw (Error) t;
- }
- }
- if (open) {
- //计数器开始统计
- channelCounter.start();
- channelCounter.setChannelSize(getDepth());
- channelCounter.setChannelCapacity(capacity);
- }
- super.start();
- }
- workerExecutor.scheduleWithFixedDelay(new BackgroundWorker(this),
- this.checkpointInterval, this.checkpointInterval,
- TimeUnit.MILLISECONDS);
- static class BackgroundWorker implements Runnable {
- private static final Logger LOG = LoggerFactory
- .getLogger(BackgroundWorker.class);
- private final Log log;
- public BackgroundWorker(Log log) {
- this.log = log;
- }
- @Override
- public void run() {
- try {
- if (log.open) {
- log.writeCheckpoint();
- //将checpoint、inflightTakes、inflightPuts都刷新至磁盘,先后将inflightPuts、inflightTakes、checkpoint.meta重建,
- //更新checkpoint文件并刷新至磁盘,这些文件都在checkpointDir目录下;更新log-ID.meta文件;同时肩负起删除log文件及其对应的meta文件的责任。
- }
- } catch (IOException e) {
- LOG.error("Error doing checkpoint", e);
- } catch (Throwable e) {
- LOG.error("General error in checkpoint worker", e);
- }
- }
- }
三,事务
很多方法和Memory的事务类相似。如:doTake(),doCommit(),doRollback(),doPut()
下面详细的介绍这几个方法。
1,doPut():source会调用put方法
- @Override
- protected void doPut(Event event) throws InterruptedException {
- channelCounter.incrementEventPutAttemptCount();
- if(putList.remainingCapacity() == 0) {//是否有剩余空间
- throw new ChannelException("Put queue for FileBackedTransaction " +
- "of capacity " + putList.size() + " full, consider " +
- "committing more frequently, increasing capacity or " +
- "increasing thread count. " + channelNameDescriptor);
- }
- // this does not need to be in the critical section as it does not
- // modify the structure of the log or queue.
- if(!queueRemaining.tryAcquire(keepAlive, TimeUnit.SECONDS)) {//尝试等待
- throw new ChannelFullException("The channel has reached it's capacity. "
- + "This might be the result of a sink on the channel having too "
- + "low of batch size, a downstream system running slower than "
- + "normal, or that the channel capacity is just too low. "
- + channelNameDescriptor);
- }
- boolean success = false;
- log.lockShared();//获取checkpoint的读锁,doTake()方法也会获取读锁,所以doTake和doPut只能操作一个,无法同时操作。
- try {
- //transactionID是在TransactionIDOracle类中递增的
- FlumeEventPointer ptr = log.put(transactionID, event);//将Event写入数据文件,使用RandomAccessFile。数据会缓存到inflightputs文件中
- Preconditions.checkState(putList.offer(ptr), "putList offer failed "
- + channelNameDescriptor);
- queue.addWithoutCommit(ptr, transactionID);//指针和事务ID加入到queue队列中。
- success = true;
- } catch (IOException e) {
- throw new ChannelException("Put failed due to IO error "
- + channelNameDescriptor, e);
- } finally {
- log.unlockShared();//释放读锁
- if(!success) {
- // release slot obtained in the case
- // the put fails for any reason
- queueRemaining.release();//释放信号量
- }
- }
- }
2,doTake():sink会调用put方法
- <pre name="code" class="java"> protected Event doTake() throws InterruptedException {
- channelCounter.incrementEventTakeAttemptCount();
- if(takeList.remainingCapacity() == 0) {
- throw new ChannelException("Take list for FileBackedTransaction, capacity " +
- takeList.size() + " full, consider committing more frequently, " +
- "increasing capacity, or increasing thread count. "
- + channelNameDescriptor);
- }
- log.lockShared();//获取锁
- /*
- * 1. Take an event which is in the queue.
- * 2. If getting that event does not throw NoopRecordException,
- * then return it.
- * 3. Else try to retrieve the next event from the queue
- * 4. Repeat 2 and 3 until queue is empty or an event is returned.
- */
- try {
- while (true) {
- FlumeEventPointer ptr = queue.removeHead(transactionID);//获取文件指针,ptr的数据结构是fileID和offset
- if (ptr == null) {
- return null;
- } else {
- try {
- // first add to takeList so that if write to disk
- // fails rollback actually does it's work
- Preconditions.checkState(takeList.offer(ptr),
- "takeList offer failed "
- + channelNameDescriptor);
- log.take(transactionID, ptr); // write take to disk
- Event event = log.get(ptr);//根据文件指针,使用log对象在磁盘中获取到Event。数据会缓存到inflighttakes文件中
- return event;
- } catch (IOException e) {
- throw new ChannelException("Take failed due to IO error "
- + channelNameDescriptor, e);
- } catch (NoopRecordException e) {
- LOG.warn("Corrupt record replaced by File Channel Integrity " +
- "tool found. Will retrieve next event", e);
- takeList.remove(ptr);
- } catch (CorruptEventException ex) {
- if (fsyncPerTransaction) {
- throw new ChannelException(ex);
- }
- LOG.warn("Corrupt record found. Event will be " +
- "skipped, and next event will be read.", ex);
- takeList.remove(ptr);
- }
- }
- }
- } finally {
- log.unlockShared();//释放锁
- }
- }
- @Override
- protected void doCommit() throws InterruptedException {
- int puts = putList.size();
- int takes = takeList.size();
- if(puts > 0) {//puts和takes不能同时都>0,其中有一个得是等于零
- Preconditions.checkState(takes == 0, "nonzero puts and takes "
- + channelNameDescriptor);
- log.lockShared();//获取锁
- try {
- log.commitPut(transactionID);//该操作会封装成一个ByteBuffer类型写入到文件,
- channelCounter.addToEventPutSuccessCount(puts);
- synchronized (queue) {
- while(!putList.isEmpty()) {
- if(!queue.addTail(putList.removeFirst())) {
- StringBuilder msg = new StringBuilder();
- msg.append("Queue add failed, this shouldn't be able to ");
- msg.append("happen. A portion of the transaction has been ");
- msg.append("added to the queue but the remaining portion ");
- msg.append("cannot be added. Those messages will be consumed ");
- msg.append("despite this transaction failing. Please report.");
- msg.append(channelNameDescriptor);
- LOG.error(msg.toString());
- Preconditions.checkState(false, msg.toString());
- }
- }
- queue.completeTransaction(transactionID);//清空checkpoint文件夹中inflightputs和inflighttakes文件的内容
- }
- } catch (IOException e) {
- throw new ChannelException("Commit failed due to IO error "
- + channelNameDescriptor, e);
- } finally {
- log.unlockShared();//释放锁
- }
- } else if (takes > 0) {
- log.lockShared();//释放锁
- try {
- log.commitTake(transactionID);//写入data文件
- queue.completeTransaction(transactionID);//和上面操作一样
- channelCounter.addToEventTakeSuccessCount(takes);
- } catch (IOException e) {
- throw new ChannelException("Commit failed due to IO error "
- + channelNameDescriptor, e);
- } finally {
- log.unlockShared();
- }
- queueRemaining.release(takes);
- }
- putList.clear();
- takeList.clear();//清空两个队列
- channelCounter.setChannelSize(queue.getSize());
- }
4,doRollback():source和sink都会调用该方法回滚数据
- @Override
- protected void doRollback() throws InterruptedException {
- int puts = putList.size();
- int takes = takeList.size();
- log.lockShared();
- try {
- if(takes > 0) {
- Preconditions.checkState(puts == 0, "nonzero puts and takes "
- + channelNameDescriptor);
- synchronized (queue) {
- while (!takeList.isEmpty()) {
- Preconditions.checkState(queue.addHead(takeList.removeLast()),
- "Queue add failed, this shouldn't be able to happen "
- + channelNameDescriptor);
- }
- }
- }
- putList.clear();
- takeList.clear();
- queue.completeTransaction(transactionID);
- channelCounter.setChannelSize(queue.getSize());
- log.rollback(transactionID);//也是封装成ByteBuffer,写入到缓存文件中。
- } catch (IOException e) {
- throw new ChannelException("Commit failed due to IO error "
- + channelNameDescriptor, e);
- } finally {
- log.unlockShared();
- // since rollback is being called, puts will never make it on
- // to the queue and we need to be sure to release the resources
- queueRemaining.release(puts);
- }
- }
Flame的FileChannel在系统崩溃的时候保证数据的完整性和一致性,其实是通过JDK的字节通道实现的(java.nio.channels.FileChannel),字节通道为了保证数据在系统崩溃之后不丢失数据,文件的修改模式会被强制到底层存储设备。
最后看下Flume FileChannel的文件结构:
checkpoint目录:
checkpoint:存放Event在那个data文件logFileID的什么位置offset等信息。
inflighttakes:存放的是事务take的缓存数据,每隔段时间就重建文件。
内容:
1、16字节是校验码;
2、transactionID1+eventsCount1+eventPointer11+eventPointer12+...;
3、transactionID2+eventsCount2+eventPointer21+eventPointer22+...
inflightputs:存放的是事务对应的put缓存数据,每隔段时间就重建文件。
内容:
1、16字节是校验码;
2、transactionID1+eventsCount1+eventPointer11+eventPointer12+...;
3、transactionID2+eventsCount2+eventPointer21+eventPointer22+...
checkpoint.meta:主要存储的是logfileID及对应event的数量等信息。
data目录:
log-ID.meta:主要记录log-ID下一个写入位置以及logWriteOrderID等信息。
log-ID:数据文件,目录里数据文件保持不超过2个。
FileChannel实现比较复杂,先写这么多,以后有需要细细了解。
http://blog.csdn.net/qianshangding0708/article/details/48133033