Depth study paper Share: Application of complex neural networks in real-valued classification task

Thesis

Complex neural networks is not a new concept, however, due to difficulties in training and performance, using real-valued models are often more popular than the complex value model. When comparing the number of real values ​​when complex-valued neural network, parameters of the existing literature generally ignored, resulting in significantly different neural network and compared. We found that when compared with a similar capacity, real and complex neural network model for a series of complex real-valued classification tasks performed equal to or slightly worse than the real value of the model. Allows the use of complex neural network processing on the complex plane noise. When using complex-valued neural network to classify real-valued data, the imaginary part of the weight portion actually follow. For this task does not require complex behavior model with an indicative value. We further investigated this in a comprehensive classification task. We can use many different strategies will activate the transfer function from the real domain to the complex domain. However, the right to re-initialize a complex neural network is still an important issue.

Articles Introduction

In recent years, the complex neural network has been successfully applied to a variety of tasks, particularly in signal processing, wherein the input data has a natural explanation in the complex domain. Generally complex-valued neural network is compared with the real-valued network. We need to ensure that these architectures in their model size and capacity. This aspect is relatively little research or treatment only from the surface. One measure of standard capacity is the number of real-valued parameters. The complex introduced into the model increases the number of computational complexity and real values ​​of the parameters, but will assume the data input and the weight. This paper discusses the performance of complex values ​​having a variable depth and width of the multilayer perceptron (MLP) is. Select the number of data parameters and activation of functions we consider the real value of the reference classification task. We propose Multilayer Perceptron architecture and training process a complex value. We believe that a variety of activation and the number of real-valued function parameters in complex situations and circumstances.

We suggest two ways to construct comparable network:

  1. Per layer provided by a fixed number of real-valued neurons
  2. By providing a real-valued parameter a fixed budget. As a reference task, we choose MNIST classification number [18], CIFAR-10 image classification [17], CIFAR-100 image classification [17]

Related work

Clark first formal description of the complex neural networks [8]. Since then several authors have complicated version of gradient descent back propagation algorithm [6,10,19]. Inspired by the work of multi-valued logic threshold of the 1970s [1], it defines the neurons and the neural network of multi-valued Aizenberg et al. [4,3] This idea who will be extended to quaternions. In the 2000's, the complex neural networks successfully used a variety of tasks [22,12,21,25]. These tasks are mainly related to the processing and analysis of complex data value. Or with a visual data mapped to complex. Particular form of the waveform image or a Fourier transform is used as the input signal and the complex-valued neural network data [15]. Another application of the complex nature of the convolution is used in image processing and signal [7]. Although the actual convolution process is widely used in image depth study, but it can be replaced by complex convolution [26,13,23,14]. Complex and matrix properties may be used to define constraints depth learning model. Introduces written by Arjovsky et al. [5] by Wisdom, who further developed. [29] constraints of complex-valued recursive network their weights into a single matrix, reducing the impact of the disappearance or explosion gradient. Recently, the complex neural network has been used to study the image and audio signals embedded filter [27,24,9]. Also, tensor decomposition has been applied to predict the embedding complex relationships between entities knowledge edge [28]. Despite their success, but the complexity of the neural network is not part of the actual value corresponds popular. Potentially, less intuitive because the training process and architecture, which is more stringent requirements due to activation functions separability in the complex plane [31,16,20]. When the complex-valued neural networks and neural networks in real terms compared to the total number of publications have ignored the parameters [3], only compare the number of parameters [26] the whole model, otherwise it does not compare to distinguish between complex or real values ​​parameters and units [30]. From the comparison of this paper is equivalent to the model view comparison of different sizes. We systematically explored consider activation function, width and depth of simple classification task Multilayer Perceptron.

Complex value network

We define a complex similar to the actual value of the corresponding portion of the neuron value, and considering differences in the structure and training. Complex neurons can be defined as:

[official]

Input x∈ƒñ, W∈ƒn × ​​m, the activation function φ b∈ƒ m are as defined above may be φ: ƒ → 'or φ: ƒ → ƒ. We will consider the non-linear activation function in more detail. In this work, we have chosen a simple real-valued function loss, but the loss of complex-valued function may work in the future. Usually the total areas not complex, since i ^ 2 = -1. Complex loss is a function defined in claim complex partial order (similar to linear matrix inequality).

Training process in the complex domain is different, because the activation function is usually not entirely complicated.

Interaction parameter complex value network

Any complex number z = x + iy = r * e I φ can be represented by two real numbers: the real portion Re (z) = x and an imaginary part Im (z) = y, or equivalent to the magnitude | z | = pX2 + 2 years = r and phase (angle) φ = arctan (Xÿ). Thus, the complex function of any one or more of the complex variable can be expressed as a function of a real variable f (z) = f (x, y) = f (r, φ) on the two. Although the use of direct and represented, but a plurality of portions defines the two neural networks. Consider the regression equation 2 outlined by the arithmetic required real and imaginary part (or equivalently, magnitude and phase). This contributes to enhanced representation calculates the input x and the weight matrix W multiplying complex values ​​to the right:

Network capacity

Number (real-valued) parameter is a measure of the ability of the network, the network capacity in function approximation complicated structure can be quantified. When too many parameters, the model tends to fill data, and when too many arguments, tend to populate the data. A + represents a complex result of real numbers (a, b) that the number of real parameters of each layer is doubled: pƒ = 2p '. Real number values ​​of the parameters of each layer should be equal between the real value and its complex architectures value (or at least as close as possible). This ensures that the model have the same capacity. Performance difference is caused by the introduction of the complex as a parameter instead of the capacity difference. Complex and in practice, the number of parameters to consider a fully connected layer. Let n be the number of parameters p 'and the real value of a plurality of layers pƒ layer is given by the following formula input dimension, m is the number of neurons:

For a k th hidden layer and the output size is c multilayer perceptron, the real number of the parameter values ​​without deviation given by the following formula:

At first glance, the design of multilayer neural network architecture than is very simple, i.e., having the same number of real-valued parameters in each layer. However, the number of neurons in each layer in half will not achieve comparable parameters. Define the number of neurons in the output layer size and the size of the input layer. We neutron having an even number of layers between the hidden layer and each layer k is selected to m 2 MLP architecture to solve this problem. We will be the same number of parameters is compared with the actual value of the network in the area of ​​each layer of the complex value MLP. Let us consider k = weight of the output and hidden layer weights four dimensions. For real value or:

Wherein mi is the (complex real number) the number of neurons in the i-th layer. The use of complex neuronal equivalent to:

Complex-valued activation function of neural networks

In any neural networks, an important decision is non-linear choice. Using the same number of parameters in each layer, we can study the impact on the overall performance of the activation function. An important Theorem choose to activate a function to consider is the Liouville theorem. Theorem states that any bounded holomorphic function f: ƒ → ƒ (over the entire complex plane are differentiable) must be a constant. Therefore, we need to select the unbounded and / or part-function activated state.

In order to study the performance of complex model function is assumed in linear separable complex parameters, we choose the identity function. This enables us to recognize when using the m neurons separable tasks may not be linear, and in the use mf neurons may be linear separable tasks. One example is an XOR function [2] of the approximation. Hyperbolic tangent function is a well-researched, and defines the complex and real numbers. Linearity correction is also very easy to understand, and are often used in real settings, but did not consider the setting values ​​in the complex. It illustrates the application of two separate portions of the plural. And selecting the amplitude squared function may be mapped to a complex real number.

Activation function network with complex values ​​are:

experiment

In order to compare real-valued and complex-valued Multilayer Perceptron (Figure 1), we classify the various tasks it has been studied. In all experiments the following, the task is to use multilayer perceptron complex value assigned to each class of a single real-valued data points:

我们测试了具有k = 0、2、4、8个隐藏层的MLP,在实值架构中固定了每一层的单元宽度,在复数值架构中交替了64和32个单元(请参阅第5节)。 我们没有应用固定参数预算。 我们测试了MNIST数字分类,CIFAR-10图像分类,CIFAR-100图像分类和Reuters主题分类的模型。 路透社主题分类和MNIST数字分类每层使用64个单位,CIFAR-10和CIFAR-100每层使用128个单位。

得到以下的结果:

复数值MLP可用于将短依赖项(例如MNIST数字分类)或短文本分类为单词袋(例如路透社主题分类)。对于两个图像分类任务CIFAR-10和CIFAR-100,结果表明复合值MLP不会在数据中学习任何结构。这两个任务在第一层需要更大的权重矩阵,权重初始化仍然是一个重要的问题。复杂神经网络中最好的非线性是应用于虚部和实部的整流器线性单位relu,类似于实值模型。身份和双曲线正切值优于relu-特别是在实值情况下。但是,使用整流器线性单元relu的结果要稳定得多。尽管激活函数| z | 2和|| z |相似,但是它们在所有任务中的性能都显着不同。大小| z |始终胜过平方大小| z | 2。在这些分类基准中,激活函数是给定模型整体性能的决定因素。激活可以允许网络从错误的初始化中恢复并适当使用可用参数。

如预期的那样,我们观察到每层神经元的固定数量和深度的增加,复杂和实数值的准确性增加。随着参数总数的增加,模型的容量也会增加。这里是路透社主题分类的一个例外,即随着深度的增加,性能会下降。当根据给定的参数预算选择每层的神经元数量(使用公式17、18进行实验2)时,性能会随着模型深度的增加而显着降低。考虑到实验1的结果,每层的宽度比整个网络的总深度更重要。我们观察到10个初始化之间的性能差异非常大。我们假设,随着深度的增加,复杂MLP中的权重初始化变得困难得多。因此,它们的性能非常不稳定。我们通过对路透分类任务进行100次运行(而不是10次运行)训练一个复杂的MLP(k = 2,tanh)来确定这一点。结果显示出与其他结果相似的行为:性能差距减小

 

表1:在MNIST数字分类任务中,由k + 2层组成的多层感知器的测试准确性,每层具有64个神经元(在复杂的MLP中交替排列64和32个神经元),输出层具有c = 10个神经元(实验1) 。 十次最佳选择。 每次跑步训练100个回合。

表2:在路透社主题分类中,由k + 2层组成的多层感知器的测试准确性,每个层具有64个神经元(在复杂的MLP中交替排列64和32个神经元),输出层具有c = 46个神经元(实验1)。 十次最佳选择。 每次跑步训练100个回合。

表3:在CIFAR-10图像分类任务中,由k + 2层组成的多层感知器的测试精度,每层包含128个神经元(在复杂MLP中交替使用128和64个神经元),输出层具有c = 10个神经元(实验) 1)。 十次最佳选择。 每次跑步训练100个回合。

表4:由k + 2层组成的多层感知器的测试准确性,每层包含128个神经元(交替128个神经元)CIFAR-100图像分类中,复杂MLP中有64个神经元)和c = 100个神经元的输出层任务(实验1)。 十次最佳选择。 每次跑步训练100个回合。

对于涉及数据在复平面上有解释的许多应用(例如信号),复值神经网络已经显示出它们是优越的[15]。 我们工作中所有选定的任务都使用实值输入数据。 我们观察到,对于选定的任务,复值神经网络的性能不及预期,并且实值架构优于其复杂版本。 首先,这种发现似乎是违反直觉的,因为每个实数值只是虚数为零的复数的特例。 用复数值模型解决实数值问题可使模型有更大的自由度来近似函数。 为什么对复杂值模型进行分类时,为什么复杂值模型不如真实模型。 在进一步检查训练过程中,我们发现复数权重的虚部始终遵循权重的实部。

在达到分类的输入上,平面上虚部和实部作用相同。因此,分类是两个相同分类的平均值。如果在训练阶段,重量的虚部的平均绝对值遵循实部的绝对值,则输入的虚部与实部的分配方式完全相同,或者所考虑的任务根本不会受益于使用复杂的-有价值的假设。此外,我们观察到,与真实值神经网络相比,复杂值神经网络对其初始化更为敏感。灵敏度随着网络的大小而增加。权重初始化由Trabelsietal建议。[26]可以减少此问题,但不能解决。 Glorot等人的方差规模化初始化的这种初始化方法复杂。等[11]。其他可能的初始化方法包括使用随机搜索算法(RSA)[31]。这需要大量的计算。我们最终尝试通过不同的最小化来多次缓解每次实验的问题。但是,复数权重的初始化仍然是一个重要且尚未解决的问题,需要进一步研究。激活函数的无穷大会导致学习过程的数值不稳定。这可能会导致学习过程失败(例如,梯度实际上是无限的)。如果学习过程在功能上达到这一点(例如奇异性),则难以恢复训练。通过约束功能,标准化权重或渐变不可避免。随着深度和结构复杂性的增加,这些选择由于其计算成本而可能不切实际。或者,也可以在设计阶段通过选择有界且完全复杂的微分激活函数来避免这种情况。找到这样的功能是困难的。另一种可能性是通过应用单独的有界激活函数(相同或不同的实函数)来帮助解决该问题,从而在实践中避免该问题。整流器线性单元是这些功能之一。虽然不是完全可以区分的,但我们发现培训过程更加稳定并且性能得到了改善。尽管由于数学上的困难而存在差异,但实际上我们可以将许多见解从真实域转移到复杂域。总之,与具有相似能力的复值模型相比,实值模型对实值任务构成了较高的性能极限,因为实部和虚部对输入的作用相同。对信息和梯度流的研究可以帮助识别从复杂值神经网络中受益的任务。考虑到现有文献和我们的发现,我们建议,如果数据在复杂域中自然存在,或者可以有意义地移到复杂平面上,则应使用复杂的神经网络进行分类任务。网络应反映权重的实部和虚部与输入数据的相互作用。如果忽略该结构,则该模型可能无法利用更大的自由度。由于更复杂的训练过程,很可能还需要更多的初始化和计算时间。

in conclusion

This work considers the comparison between the reference value and classification tasks in complex real-valued Multilayer Perceptron. We found that even complex value model allows greater freedom in the performance of complex MLP value of the actual value of the data classification is also similar or worse. We recommend the use of a plurality of neural network, if a) the input data is mapped to the complex nature having a function, b) a noise distribution of the input data on the complex plane, or c) can learn complex values ​​from the real values ​​in the data embedding. We can (for example, by an average absolute value) to determine the benefit of the task by comparing the actual and fictional weight training behavior. If you do not follow the imaginary part of the general behavior of the real part of the entire period, then the task will be complicated hypothesis hypothesis benefit. Other aspects of the design model to consider is the activation function, weight initialization compromise between strategy and performance, the model size and computational cost. In our work, the best performance of a linear activation function is applied rectifier unit assembly area. We use Wirtinger calculus or prevent binding of a specific point policy gradient-based methods, which are applied to the real part of two, thereby transferring many real and complex activation function. Trabelsi, who described initialization. [26] can help reduce initialization problem, but further studies are needed. Similar to many other architectures, introducing a plurality of parameters are determined as a tradeoff between the size of the task-specific properties, the model (i.e., the number of real values ​​of the parameters) and the decision computational cost.

references

Published 186 original articles · won praise 0 · views 10000 +

Guess you like

Origin blog.csdn.net/dudu3332/article/details/104500033