Deep Fully Deconvolutional Neural Networks (with paper)

Chapter One

introduction

1. Motivation of this article

In the past few years, computer vision research has mainly focused on convolutional neural networks (often abbreviated as ConvNet or CNN), which have achieved state-of-the-art performance on a large number of tasks such as classification and regression. Although the history of these methods goes back many years, the theoretical understanding of these methods and the interpretation of the results are relatively shallow.

In fact, many achievements in the field of computer vision use CNN as a black box. Although this method is effective, the interpretation of the results is ambiguous, which cannot meet the needs of scientific research. Especially when these two problems are complementary:

d47e62d2b349aca45e42305ed6714efbe5ed61d9Aspects of learning (such as convolution kernels), what exactly does it learn?
d47e62d2b349aca45e42305ed6714efbe5ed61d9In terms of model structure design (such as the number of convolution layers, the number of convolution kernels, the pooling strategy, the choice of nonlinear functions), why are some combinations better than others? Solving the answers to these questions will not only help us better understand the convolutional neural network, but also further improve its engineering practicality.

In addition, the current CNN implementation methods require a large amount of training data, and the design of the model has a great impact on the final result. And a deeper theoretical understanding should alleviate the model's dependence on data. Although a lot of research has focused on the implementation of convolutional neural networks, so far, these research results have largely been limited to the visualization of the internal processing of convolutional operations. The purpose is to understand the different layer changes.

2. The purpose of this paper

In response to the above problems, this article will review several current best multi-layer convolutional structure models. More importantly, this paper will also summarize the various components of standard convolutional neural networks through different approaches and present the biological or plausible theoretical basis on which they are based. In addition, this article will describe how to use visualization methods and case studies to try to understand the changes inside the convolutional neural network. Our ultimate goal is to show the reader every convolutional layer operation involved in a convolutional neural network in detail, highlighting the current state-of-the-art convolutional neural network models and illustrating the problems that still need to be solved in the future.

Chapter two

Multilayer network structure

In recent years, prior to the success of deep learning or deep neural networks, the state-of-the-art methods for computer vision recognition systems consisted mainly of two steps that were separate but complementary:

d47e62d2b349aca45e42305ed6714efbe5ed61d9First, we need to transform the input data into a suitable form through human-designed operations such as convolution, local or global encoding methods. This transformation of the input is usually done to obtain a compact or abstract representation of the input data, while manually designing some invariants according to the needs of the current task. Through this transformation, we are able to represent the input data into a form that is easier to separate or identify, which facilitates subsequent identification and classification.
d47e62d2b349aca45e42305ed6714efbe5ed61d9Second, the transformed data is often used as the input signal for the training of classifiers such as support vector machines. In general, the performance of any classifier is affected by the quality of the transformed data and the transformation method used.

The emergence of multi-layer neural network architectures has brought new ways to solve this problem, which can not only train object classifiers, but also learn the required transformation operations directly from the input data. This style of learning is often referred to as representation learning, and when it is applied to deep or multi-layer neural network structures, we call it deep learning.

A multilayer neural network is defined as a computational model that extracts useful information from hierarchical abstract representations of input data. In general, the goal of designing a multi-layer network structure is to highlight the important information of the input data at high layers, while making those less important information changes more robust.

In recent years, researchers have proposed many different types of multi-layer architectures, and most of the multi-layer neural networks are in a stacked manner, combining some linear and nonlinear function modules to form a multi-layer structure. This chapter will cover state-of-the-art multilayer neural network architectures for computer vision applications. Among them, the artificial neural network is the focus we need, because the performance of this network structure is very prominent. For convenience, in the following we will simply refer to this type of network as a neural network.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325818316&siteId=291194637