本文记录了在准备ICIP论文的过程中发现的论文撰写规律，经验主要基于五篇论文，分别是：

文献一：《A Parallel Convolutional Neural Network Architecture for Stereo Vision Estimation》(2017)
文献二：《Accurate Dense Stereo Matching for Road Scenes》(2017)
文献三：《Deep Stereo Confidence Prediction for Depth Estimation》(2017)
文献四：《Patch-Based Stereo Matching Using 3D Convolutional Neural Networks》(2017)
文献五：《Two-Stage Convolutional Neural Network for Light Field Super-Resolution》(2017)。

更新于2018.11.30。

问题综述
算法概括（算法细节）
实验结果

Index Terms
Introduction

引子 + 应用场景
基本概念 + 现状概述
论文结构介绍示例

Experimental Results
Conclusion

方法功能综述
各部分概括（或优势重申）
实验结果
未来计划

其他问题

论文结构

论文结构基本相同，依次为：

Abstract --> Index Terms --> Introduction --> Proposed Method --> Experimental Results/Experiments and Discussion/Experiments --> Conclusion

论文的篇幅通常为4-5页，且第5页只允许包含参考文献。

下面将按照论文的顺序，分析各个部分的撰写规则。

摘要

摘要部分通常的篇幅要求在100-150字。内容主要遵从以下结构：

（问题综述） --> 算法概括（算法分部介绍） --> 实验结果及评价。

注意在算法概述中，应用一般现在时。

问题综述

问题综述通常用简单的一句话概括目前工作的意义（通常涉及应用场景和挑战）和目前方法存在的问题，从而引出文章中要介绍的工作。因此，这一部分的介绍需要与具体工作相关联，最好有衬托的作用。但是对于会议论文而言，通常篇幅较短，因此在摘要部分也可以没有问题综述（如博主阅读的五篇ICIP论文中，仅有2篇有问题综述，且其中一篇的综述过长，参考意义不大）。

问题综述例句：

Extracting depth information from the stereo image pair is a commonly used method in 3-D computer vision. For robotics and unmanned vehicle applications that require real-time performance, speed is often more important than accuracy. In recent years, Convolutional Neural Networks (CNNs) have shown great success in many computer vision applications including classification, segmentation, object detection, edge detection, and stereo vision estimation. Existing network architectures for stereo vision estimation predict very little information during the forward pass and are only able to calculate the disparity for one pixel at a time. 这一篇论文的摘要共143字，用了大量的篇幅说明场景、问题和现状，个人认为篇幅有点长，而且逻辑欠妥。
Stereo matching task is the core of applications linked to the intelligent vehicles.

算法概括（算法细节）

通常在问题综述后面，会联系前面提到的困难和问题，给出论文中方法的简介。大部分论文都是先给出一句话的方法概括，简单说明算法的功能，随后再详细介绍算法的主要组成部分；但是也有一部分论文直接给出每个部分的功能概括。需要注意的是，在功能介绍的时候，通常会用简短的描述体现出算法的优势。

算法简介示例：（高亮部分为一句话的方法概括）

In this paper, we propose a parallel architecture to speed up disparity map computation by simultaeously processsing all pixels on one horizontal line. 通常不应该这么短，这篇论文的背景介绍部分很多。
In this paper, we present a new variant function of the Census Transform (CT) which is more robust against radiometric changes in real road scenes. We demonstrate that the proposed cost function outperforms the conventional cost functions using the KITTI benchmark. The cost aggregation method is also updated for taking into account the edge information. This enables to improve significantly the aggregated costs especially within homogeneous regions. The Winner-Takes-All (WTA) strategy is used to compute disparity values. To further eliminate the remainder matching ambiguities, a post-processing step is performed. 这一段是细节概括与算法评价穿插来的，每个评价对应一个部分。
In this paper, we investigate a convolutional neural network (CNN) approach for light field (LF) super-resolution (SR). We are motivated by the assumption that image priors can be embedded into CNN, and both external and internal correlations are important in LFSR. The LF images are indeed natural images except for its angular resolution, so the external correlations help to super-resolve a single image from a collection of general images, whilst the internal correlations are essential to enhance a single view in LF with the details in the other views. Accordingly, we propose a two-stage CNN, where the two stages exploit the external and internal correlations, respectively. Moreover, to improve the generalization ability of the second-stage CNN for inter-view SR, we propose to align different views at patch level to compensate for the disparity that is essential to LFSR, thus the second stage is termed multi-patch fusion CNN. 这一部分分别描述了整体算法功能、算法提出的动机（发现的问题）、具体算法结构和每个结构的作用。
We present a novel method that predicts a confidence to improve the accuracy of an estimated depth map in stereo matching. In contrast to existing learning based approaches relying on hand-crafted confidence features, we cast this problem into a convolutional neural network, learned using both a matching cost volume and its associated disparity map. As the size of the matching cost volume varies depending on a search range of stereo image pairs, we propose to use a top-K matching probability volume layer so that an input size for convolutional layers remains unchanged. 这个算法介绍部分几乎没有写什么实质性的问题，只是说用了卷积神经网络而已，用于对比的也是hand-crafted的传统方法。
In this paper, we propose patch-based stereo matching using 3D convolutional neural networks (CNN). We extract spatial color and disparity features simultaneously through 3D CNN. We treat stereo matching as multi-class classification that the classes are all possible disparity values. We first generate a large set of patches from stereo images for 3D CNN. Then, we get an initial disparity map through 3D CNN and refine it using color image guided filtering. The color image guided filtering minimizes outliers and refines edges in disparity without texture copying artifacts. 这一部分的介绍先说明了功能，随后说明了实现方法，但是主要集中在流程上，而非结构介绍。

实验结果

这一部分通常给出具有说明性的实验结果。如果有benchmark，通常为在benchmark内的排名；如果没有，就给出实际指标，最终给一句总结，最好最后的总结与前面描述的问题呼应。如果在benchmark上的排名是第一就直接说rank first，否则就只说比state-of-the-art方法好就可以，但是要给出具体的数据支撑。

实验结果示例：（高亮部分为算法效果总结）

We train and test our network on five Middlebury datasets. Our parallel architecture achieves at least 10x speedup compared to existing networks. Its accuracy is also very competitive.
Experiments were conducted on the new Middlebury dataset, as well as on the real road traffic scenes of the KITTI database. Obtained disparity results have demonstrated that the proposed method is promising.
Experimental results demonstrate the superior performance of our two-stage CNN compared with the state-of-the-art CNN-based SR methods.
Experimental results demonstrate that the proposed method outperforms the state-of-the-art confidence estimation approaches on various benchmarks.
Experimental results show that the proposed method sudccessfully estimates disparity in smooth and discontinuity regions while preserving edges as well as outperforms the state-of-the arts in terms of average errors.

Index Terms

关键词（索引词）个数在3-5个，通常为词或固定短语。各关键词之间描述的内容不应相同，主要概括算法的各项技术或功能。

Introduction

Introduction的篇幅在1页左右，内容有一定差异，与论文内容相关性较大。对于ICIP论文，由于论文本身篇幅只有4页，不需要单独设置一章总结Related Work，因此相关工作的介绍需要在Introduction下完成。

Introduction的第一段主要用于引出话题，其主要结构可以表示为：

引子 --> 应用场景 --> 基本概念 --> 现状概述或问题描述 --> 存在的问题。

总体结构可以表示为（仅供参考，并非规则）：

第一段（格式如上） --> 研究现状 --> 算法结构或优势介绍 --> 论文结构介绍。

引子 + 应用场景

应用场景这一部分通常会跟有参考文献，但不是必须。也有的论文略过这一部分，直接从基本概念的演变开始说起。

Stereo vision plays a critical role in many robotic vision applications. It is crucial to extract depth information of the relative position of 3D objects, which separates occluding image components. // Stereo vision is widely used in many fields such as aerial surveys, autonomous vehicles, etc.
Stereo matching is one of the most widely studied problems in computer vision. // It has found various applications in the image processing domain e.g. 3D reconstruction, object detection and intelligent vehicles.
As one of the most important topics in computer vision, stereo correspondence has been actively studied over the last few decades. // The disparity map (or depth map) obtained using stereo correspondence is widely used in many applications such as 3D reconstruction, object detection, and driver assistance system.
Stereo matching has been an highly active research area in computer vision for decades, and a variety of methods have been proposed to produce accurate disparity maps so far.

基本概念 + 现状概述

有的论文会在第一段简单概括研究现状，也有的论文不作概括，直接在后面的论述中详细介绍。

A stereo vision algorithm computes the disparity for each pixel from two rectified images taken from the left and right cameras. Disparity refers to the difference in horizontal location of an object in both images, as shown in Fig.1.
Stereo matching algorithms described here typically operate on rectified images. // They can be roughly classified into two categories: global and local approaches. Global approaches consider whole pixels in the image to produce depth values. Thus, these approaches usually produce quite accurate depth results. However, this kind of algorithms has a high complexity problem. Local approaches consider specific pixels in the image to estimate depth values. Thus, they are computationally cheap. However, they usually produce less accurate depth values.
Light field refers to the function of light with respect to position and direction, which has been introduced into digital image processing in 1996. 这篇论文就没有引子和应用场景，直接以基本概念开始，按照其演变撰写论文。

论文结构介绍示例

这一部分给出了Introduction中最后介绍论文结构的几个示例。需要注意的是，由于会议论文篇幅较短，因此有的论文可能省略论文结构介绍。

Details of our estimation model architecture are introduced in Section 2. Section 3 discusses our experiment settings and results. finally, the paper is summarized and concluded with future research directions in Section 4.
The remainder of this paper is organized as follows. Section 2 presents the details of our method. Section 3 presents experimental results, followed by conclusion in Section 4.

Experimental Results

实验结果这一部分与论文方法介绍Proposed Method很像，不同论文之间的差异比较大。但是对于实验结果这一章而言，还是有一定共同点的。这里博主总结一下通常在这一部分需要说明的问题：

实验环境：硬件条件、工具包、架构等。
实验设置：训练的方法、用于训练和测试的数据库、超参数的设置等；其中数据库部分可以有简单的介绍。
用于对比的算法：这一部分通需要简述选择这些算法的理由。
实验结果及对结果的比较和分析。

Conclusion

结论部分的篇幅差异相对摘要要大一点，但范围通常在11-12行之间，但也有17行的。主要结构为：

方法功能综述 --> 各部分概括（或优势重申） --> 实验结果 --> 未来计划。

方法功能综述

注意在结论中通常使用一般过去时，但也有用现在时的。这一部分有时也会包括实验结果的概括（但是看起来结构比较乱，不建议这么写）。

示例：

In this paper, we proposed patch-based 3D CNN for stereo matching.
In this study, we have presented a learning framework for estimating stereo matching confidence by using both matching cost volume and initial disparity map in CNNs.
In this paper, we presented a new stereo matching algorithm based on a new variant of the Census cost function for the cost computation stage.
In this paper, we propose a parallel CNN architecture for stereo vision estimation. Our results on the Middlebury datasets show that our network architecture produces accurate results. 后面这一句就是实验结果的综述，个人认为破坏了结论的结构，看起来不太舒服。
We propose a two stage CNN-based method for light field SR. The external and internal correlations have been exploited in the two stages, respectively.

各部分概括（或优势重申）

这一部分种类比较多，每篇论文都有自己的特点。有的是概括各个部分的细节，有的论文是总结算法或结构的优势，有的论文是重申算法具有的功能。也有论文会把这一部分和实验结果融合在一起说，不过这样看起来比较乱。

示例：

We learned both disparity and color features in 3D-DSI using 3D CNN. First, we extracted number of patches from Middlebury stereo dataset for training and validation. With the trained network, we predicted each pixel’s disparity and acquired an initial disparity map. Since the results contained some outliers and blurry edges, we performed color image guided filtering to smooth them and remove texture copying artifacts while preserving edges in disparity.
It is assumed that the optimal confidence features can be learned from the matching probability volume together with the initial disparity map. With the depth refinement method using the proposed confidence estimation method, we obtained an accurate and robust disparity map for public datasets as well as for challenging outdoor environments.
Unlike the existing CNN stereo vision architectures that either determine whether a pair of small image patches is a good match, or compute the disparity for one pixel in one forward pass time, our parallel architecture can estimate disparities for all pixels in the center row of an entire image strip in one single forward pass.
Trained with a general image set, the first-stage CNN is capable in enhancing the views individually. and the second-stage CNN further enhances the target view from the information of neighboring view. Moreover, we propose to perform view registration between the two stages to handle the disparity explicitly.

实验结果

这一部分用几句话简单总结实验结果，主要用于突出算法的优势和应用场景（数据库）。

Experimental results demonstrate that the proposed method achieves better performance in stereo matching than state-of-the-arts while successfully preserving edges with less outliers.
Experimental results, using real road scenes of the KITTI dataset, have demonstrated that the proposed variant leads to the lowest disparity mean errors compared to the top performer in this dataset. Moreover, a local method based on cross aggregation is updated to incorporate the edge information. The modified aggregated costs have leaded to an improvement of disparity results. A post-processing step is performed to remove any noise left. The obtained disparity results are considered promising. 这一段将实验结果和论文结构介绍放在了一起，可读性不强，不建议这么写。
Our architecture significantly speeds up the processing while only sacrificing roughly 2% of the disparity accuracy to do so.
Experimental results show that the proposed method outperforms the state-of-the-art CNN-based methods.

未来计划

这一部分主要简述在本文的基础上，下一步的工作方向。有的论文会直接给出下一步的工作方向，也有的论文会稍微说一下不足或可以改进的地方再说下一步的工作。如果这一部分比较长，建议单独成段。

Though the confidence estimation is based on the CNN architecture, the depth refinement step still relies on the hand-crafted approach. As future work, we will study a learning-based approach that refines a depth map in a deep convolutional neural network framework.
With a balance between accuracy and efficiency, our proposed method shows great potential for real-time robotic vision applications. In the future, we will focus on modification for improved speed and implementation on a field programmable gate array device for embedded vision sensor application.
In the future, we plan to investigate fractional-pixel-level patch search to further reduce the interference of disparity.

其他问题

与之前的工作不同，近几年的论文逐渐会出现用第一人称做主语的情况，但比较常用的通常是We、Our，而非其他第一人称。
ICIP中习惯称数据集为dataset，而TCSVT则习惯称data set。

ICIP论文结构整理

TCSVT论文结构整理

论文结构

摘要

问题综述

算法概括（算法细节）

实验结果

Index Terms

Introduction

引子 + 应用场景

基本概念 + 现状概述

论文结构介绍示例

Experimental Results

Conclusion

方法功能综述

各部分概括（或优势重申）

实验结果

未来计划

其他问题

猜你喜欢