cvpr 2016 论文学习 Video object segmentation

Abstract— Video object segmentation, a binary labelling

problem is vital in various applications including object tracking,

action recognition, video summarization, video editing, object

based encoding and video retrieval（检索）. This paper presents an

overview of recent strategies in video object segmentation（分类）,

focusing on the techniques for solving challenges like complex

and moving background, illumination（光线/照明） changes, occlusions（遮挡）,

motion blur（运动模糊）, shadow effect and view point variation. Significant

works evolved in this research field over recent years are

categorized based on the challenges solved by the researchers. A

list of challenging datasets and evaluation metrics（指标） available for

video object segmentation is presented. Finally, research gaps in

this domain（领域） are discussed.

摘要：视频分割算法，这是一种在各种领域中，比如对象跟踪，动作识别，视频分类，视频剪辑，基于编码和视频检索的重要方法。

这篇论文主要呈现的是最近视频分割算法中的策略，它主要解决了复杂的移动背景，照明变化，遮挡物品，运动模糊，阴影效果，和视角的变化的问题。

本论文中涉及到的重要工作和挑战都被分好类了，同时，他也提供了要学习的数据集和评价指标，最后，他也讨论了这块领域的分歧点。

Recent internet world is engaged with massive amount of

video data thanks to the development in storage devices and

handy imaging systems. Huge terabyte（兆字节） of video are regularly

generated for various useful applications like surveillance（监控）,

news broadcasting, telemedicine, etc. Based on the

information provided by CISCO on ‘Visual Networking Index

(VNI)’, the growth of internet video traffic will be three fold

from 2015 to 2020. Manually extracting semantic（语义） information

from this enormous amount of internet video is highly

unfeasible, seeking the need for automated methods to

annotate（注释）/derive（导出） useful information from the video data for

video management and retrieval [1]. Hence, one of the

essential steps for video processing and retrieval is video

object segmentation, a binary labelling problem for

differentiating the foreground objects accurately from the

background. Video object segmentation aims at partitioning（分离）

every frame（帧） in a video into meaningful objects by grouping

the pixels along spatio-temporal（时空的） direction that exhibit

coherency in appearance and motion [2]. Video object

segmentation task is highly challenging due to the following

reasons: (i) unknown number of objects in a video (ii) varying

background in a video and (iii) occurrence of multiple objects

in a video [3]. Existing approaches in video segmentation can

be broadly classified into two categories viz, interactive（交互式）

method and unsupervised（无监督的） method. Interaction objects

segmentation method human intervention in initialization

process while unsupervised approaches can perform object

segmentation automatically. In Semi supervised approaches，

user intervention is required for annotating initial frames and

these annotations are transferred to the entire frames in the

video. Automated object segmentation approaches [7][8][9]

can segment any video data into meaningful objects without

user interaction based on object proposals and motion cues

from the video. The common assumption followed by most of

the automated methods is that only single object is moving

throughout the video and use only the motion information for

segmenting the object from the background. This assumption

will lead to poor segmentation under discontinuous motion of

object [11]. Referring the literature [12] [13] [14] [15] for

survey on video object segmentation which describes the

techniques available for image segmentation, not to video

data. In [15] authors classified the approaches in video

segmentation as inference and feature modes. The

segmentation techniques propose so far to improve the

segmentation results are grouped as inference modes and

methods that depend on features like depth, motion and

histogram are termed as feature modes. From this observation,

it is evident that none of the researchers have discussed the

segmentation approaches from the perspective of the

challenges solved by the algorithm. Hence this paper

categorizes the significant work contributed by researchers in

video object segmentation based on the issues resolved by the

respective authors. Several issues degrading the segmentation

performance are moving back ground, moving camera,

illumination variation, occlusion, shadow effect, viewpoint

variation, etc. Moreover the proposed algorithm should

provide tradeoff between segmentation accuracy and

complexity. As depicted in fig. 1, this paper classifies the

video object segmentation task as:

1. Issue tackling mode

2. Complexity reduction mode and

3. Inference mode

The main contributions of this paper are:

x Summarizing the recent activities in video object

segmentation domain.

x Categorizing the significant works in this research

field meaningfully and

x Presenting a list of database and evaluation metrics

needed for developing an efficient video object

segmentation framework.

Organization of this paper: Section II describes the algorithms

contributed significantly in tackling the issues (discussed

earlier) involved in video object segmentation. Section III

presents an overview on segmentation approaches with

reduced complexity available in literature. Section IV provides

a gist on object segmentation techniques that fall under

inference mode. Section V lists the dataset and the evaluation

metrics used in these segmentation approaches and discusses

about research gaps in video object segmentation field.

Section IV concludes this study.

最近，我们的网络世界充斥着各种各样的视频信息。。。。。总之用了一大段话告诉你这很重要啦，然后就是说，我们的目的（视频分割算法）是分离视频中的每一帧，从而展示出视频中的物体的动静和谐（估计是一种一致性），可是，视频分割有以下难点：

1.不知道视频中有多少目标对象

2.多变的背景

3.多重目标对象

现在主要是两种方法：交互式，无监督方法。当然我们这篇文章肯定是无监督方法，自动分割视频对象。

而在半监督学习方法中，初始化一开始的帧，和我们要分割的对象肯定是必要的，但是无监督方法不需要这一点，现在存在的很多视频分割算法都是假设只有单独的物体对象在移动，但是面对不连续移动的对象时候，会导致不良的效果。而在引文15中的作者认为视频分割的算法应该是基于“ 特征提取和推断”，索引12-15是关于图像分割的方法总结。基于目前的观察，显然很少有人从算法角度总结了视频分割的方法，因此这篇文章总结了在一些可能会降低视频分割准确度的一些问题，比如说移动的背景，移动的照相机，光线变化，遮挡物，阴影效果，视角变化。

更进一步，这篇文章提出的算法将会权衡算法的复杂度和准确度之间的考量。

所以本篇文章的架构是;

1.解决处理方式

2.复杂度降低

3.干扰的分析

本篇文章主要的三个贡献：1.总结了最近这个领域中的工作 2.将目前所用到的方法分类3.提供了一些数据集和数据标准供阅读者练习。

II. ISSUE TACKLING MODE

This section details about ‘issue tackling mode’, first

category of the video object segmentation approach. Though

several issues (as discussed earlier) affect the performance of

the segmentation approaches, commonly occurring problems

are moving background, occlusion, shadow, rain , moving

camera, illumination and view point variation.

A. Surveillance video systems

The traffic surveillance systems include detection and

recognition of moving vehicles (objects) from traffic video

sequence. For any traffic surveillance system, vehicle

segmentation is the fundamental step and base for tracking the

vehicle movements. But, Vehicle segmentation in traffic

video is still challenging due to the moving objects and

illumination variations. To solve this issue, an unsupervised

neural network based background modelling has been

proposed for real time objects segmentation. In this work,

neural network serves as both adaptive model of the

background in a video sequence and a classifier of pixels as

background/foreground. The segmentation time taken by the

neural network is improved by implementing it in FPGA kit.

Though this neural network based background subtraction

method achieves good segmentation accuracy, it works well

only under slightly varying illumination and moving

background. A high cost is involved in reducing time

complexity [16]. Followed by this, [17]Appiah et. Al proposed

an integrated hardware implementation of moving object

segmentation in real time video stream under varying lighting

conditions. Two algorithms for multimodal background

modelling and connected component analysis is implemented

on a single chip FPGA. This method segments objects under

varying illumination condition at high processing speed. The

two algorithms described so far do not take raining issue into

account. Under raining situation, shadows and colour

reflections are the major problems to be tackled. A

conventional video object segmentation algorithm that

combines the background construction-based video object

segmentation and the foreground extraction-based video

object segmentation has been proposed. The foreground is

separated from the background using histogram-based change

detection technique and object regions are segmented

accurately by detecting the initial moving object masks based

on a frame difference mask. Shadow and colour reflection

regions are removed by diamond window mask and colour

analysis of moving object respectively. Segmentation of

moving objects are refined by morphological operations. The

segmentation results of moving objects under rainy situations.In the future, we will adaptively

obtain the threshold and adjustthe content of the video

automatically. Later, Chien et al [19] proposed a video object

segmentation and tracking technique for smart cameras in

visual surveillance networks. A multi-background model

based on threshold decision algorithm for video object

segmentation under drastic changes in illumination and

background clutter has been developed. In this method, the

threshold is selected robustly without user requirement and it

is different from per pixel background model which avoids

possible error propagations. Another algorithm for extracting

objects from videos captured by static camera has been

proposed to solve issues like waving tree, camouflage region

and sleeping is also proposed [20]. In this method, reference

background is obtained by averaging of some initial frames.

Temporal processing for object extraction do not consider

spatial correlation amongst the moving objects across frames.

Hence, an approximate motion field is derived using the

background subtraction and temporal difference mechanism.

The background model adapts temporal changes (swaying

trees, rippling water, etc) which extract the complementary

object in the scene.

using [18] is shown in fig.2.对于交通检测来说，最重要的就是分类各种各样的交通工具，但是，因为物体总是在运动的原因，所以还是很难识别。所以为了解决这个问题，一个无监督的神经网络被我们用来作为视频中前景色和背景色的适应性模型和像素的分类器。这个神经网络的运算时间可以把他装在fgpa上来减少，虽然这个神经网络作为“筛除背景”的方法取得了很高的分类效果，但是只能在光影变化不大和背景几乎不动的情况下使用，同时，减少时间复杂度的成本很高的，所以，Appiah et. Al提出了一种可在集成硬件上实行的算法，这两种算法在单核fgpa上就能实现，而且它很好地解决了光线问题。但是他没有解决雨天的问题，在雨天，阴影和光线的反射是最主要的问题。传统的算法将基于架构的背景分类和分离出的前景物体混合在一起。而前景应该利用基于“直方图”改变的侦测技术，目标区域也应该被分割出来，方法是侦测最初的移动物体基于移动物体的掩模和帧差异的掩码上（这是什么意思，目前没搞明白）。反正他说阴影和颜色反射的部分会被一个diamond window mask和颜色分析移动的目标算法分别来处理。这是一种分形几何的算法，移动中需要分割的物体被这种算法给限制了，在fig2中结果被呈现了出来。在未来，我们让算法自动适应性地调整“阈值”和“调整的内容”。之后，chien提出了一种对小型照相机的视觉神经网络的算法，在这种方法中，“阈值”不需要使用者的帮助就能鲁棒地给出，而且它与逐像素的算法还不同，避免了可能的错误宣传？？（啥意思，不懂）。还有一种算法是使用静态相机的，专门用来捕捉正在摇晃的树，还有一些伪装的东西。初始化时利用一些初始帧的平均值（什么意思？？），但是这种算法没有考虑空间相关性，尤其是那些逐帧移动的物体？？。

个人结合这种算法的那张效果图，觉得就是可以过滤掉光影效果，仅留存真正的目标。

最后一段话真的看不懂了，所以就直接谷歌翻译了？？？？

因此，使用该推导出近似运动场
背景减法和时间差分机制。
背景模型适应时间变化（摇摆
树木，涟漪水等）提取互补
场景中的物体。？？？？

B. Generic video sequences

Moving foreground object extraction from a given generic

video shot is one of the vital tasks for content representation

and retrieval in many computer vision applications. An

iterative method based on energy minimization has been

proposed for segmenting the primary moving object efficiently

from moving camera video sequences. Initial object

segmentation obtained using graph-cut is improved repeatedly

by the features extracted over a set of neighbouring frames

[21]. Thus, this iterative method can efficiently segment the

objects in video shots captured on a moving camera. A

conditional random field model based video object

segmentation system, capable of segmenting multiple moving

objects from complex background has been proposed [22]. In

this work, a complementary property of point and region

trajectories is utilized effectively by transferring the labels of

sparse point trajectories to region trajectories. Region

trajectories based on shape consistency provides robust design

to segment spatially overlapping region trajectories. As region

trajectories are extracted from hierarchical image over

segmentation, it segments meaningful regions over time.

time and computational complexity. Unsupervised

segmentation of moving camera video sequence using inter

frame change detection has been proposed [23].

通用视频序列

这里提到了一种“迭代算法”，他的初始化就是通过一开始的几帧的图片分割，从一些相邻帧中提取的元素，所以这种算法可以从“移动的相机”中提取信息？？？ 论文22中提起了个

基于条件随机场模型的视频对象
分割系统，能够分割多个移动
已经提出了来自复杂背景的物体（？？？谷歌翻译结果）

论文22主要提及了从稀疏轨迹到稠密轨迹的算法？

论文23提及的是无监督学习方法？

cvpr 2016 论文学习 Video object segmentation

猜你喜欢