Kafka清除数据日志详解

使用Kafka的时候我们一般都会根据需求对Log进行保存,比如保存1天、3天或者7天之类的。

我们可以通过以下的几个参数实现:

(1)配置Log过期时间

log.retention.hours

log.retention.minutes

log.retention.ms

上面三个配置向配置其中一个即可kafka默认是配置log.retention.hours,其值是7天

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

Kafka处理日志过期时间代码如下

private def getLogRetentionTimeMillis(): Long = {
  val millisInMinute = 60L * 1000L
  val millisInHour = 60L * millisInMinute
  if(props.containsKey("log.retention.ms")){
     props.getIntInRange("log.retention.ms", (1, Int.MaxValue))
  }
  else if(props.containsKey("log.retention.minutes")){
     millisInMinute * props.getIntInRange("log.retention.minutes", (1, Int.MaxValue))
  }
  else {
     millisInHour * props.getIntInRange("log.retention.hours", 24*7, (1, Int.MaxValue))
  }
}

(2)配置Log保存的大小

默认是不限制Log存储大小

# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

注意:

删除日志的时候是两个参数都需要配置才起作用还是只需要配置其中一个?

其实只需要配置其中一个即可,源代码如下:

def cleanupLogs() {
  debug("Beginning log cleanup...")
  var total = 0
  val startMs = time.milliseconds
  for(log <- allLogs; if !log.config.compact) {
    debug("Garbage collecting '" + log.name + "'")
    total += cleanupExpiredSegments(log) + cleanupSegmentsToMaintainSize(log)
  }
  debug("Log cleanup completed. " + total + " files deleted in " +

                (time.milliseconds - startMs) / 1000 + " seconds")

}

private def cleanupExpiredSegments(log: Log): Int = {
  val startMs = time.milliseconds
  log.deleteOldSegments(startMs - _.lastModified > log.config.retentionMs)
}

private def cleanupSegmentsToMaintainSize(log: Log): Int = {
  if(log.config.retentionSize < 0 || log.size < log.config.retentionSize)
    return 0
  var diff = log.size - log.config.retentionSize
  def shouldDelete(segment: LogSegment) = {
    if(diff - segment.size >= 0) {
      diff -= segment.size
      true
    } else {
      false
    }
  }
  log.deleteOldSegments(shouldDelete)
}

cleanupLogs函数就是清理需要删除的日志。其中调用了cleanupExpiredSegmentscleanupSegmentsToMaintainSize函数,分别对应于上面按照保存时间和保存的Log大小策略删除的,从这个可以看出,只需要配置一种删除策略即可。

deleteOldSegments函数就是根据相关的条件找出需要删除的Segments。

def deleteOldSegments(predicate: LogSegment => Boolean): Int = {
  // find any segments that match the user-supplied predicate
  //UNLESS it is the final segment
  // and it is empty (since we would just end up re-creating it
  val lastSegment = activeSegment
  val deletable = logSegments.takeWhile(s => predicate(s) &&
  (s.baseOffset != lastSegment.baseOffset || s.size > 0))
  val numToDelete = deletable.size
  if(numToDelete > 0) {
    lock synchronized {
      // we must always have at least one segment, so if we are
      // going to delete all the segments, create a new one first
      if(segments.size == numToDelete)
        roll()
      // remove the segments for lookups
      deletable.foreach(deleteSegment(_))
    }
  }
  numToDelete
}

deleteOldSegments根据传进来的predicate找出需要删除的Segments,并存放到deletable中。最后遍历deletable中的Segment,并调用deleteSegment函数去删除。

private def deleteSegment(segment: LogSegment) {
  info("Scheduling log segment %d for log %s for deletion."
.format(segment.baseOffset, name))
  lock synchronized {
    segments.remove(segment.baseOffset)
    asyncDeleteSegment(segment)
  }
}

deleteSegment最终调用的是asyncDeleteSegment函数

private def asyncDeleteSegment(segment: LogSegment) {
  segment.changeFileSuffixes("", Log.DeletedFileSuffix)
  def deleteSeg() {
    info("Deleting segment %d from log %s.".format(segment.baseOffset, name))
    segment.delete()
  }
  scheduler.schedule("delete-file", deleteSeg, delay = config.fileDeleteDelayMs)
}

这个删除是异步进行的。从实现来看,删除是由另外一个线程执行的。删除之前会将需要删除的Log名字加上.deleted后缀:

然后会经过log.segment.delete.delay.ms(默认1分钟)时间之后彻底删除那些Segments。

Kafka集群会每隔log.retention.check.interval.ms(默认5分钟)时间去检测需要删除的Segments。

scheduler.schedule("kafka-log-retention",
                         cleanupLogs,
                         delay = InitialTaskDelayMs,
                         period = retentionCheckMs,
                         TimeUnit.MILLISECONDS)


 

发布了69 篇原创文章 · 获赞 2 · 访问量 4165

猜你喜欢

转载自blog.csdn.net/zuodaoyong/article/details/104192700