Principles of Graph Convolutional Networks (1) [Motivation for Introducing Graph Neural Networks]

Preface

Next, I will write a series of tutorials to introduce the motivation and basic principles of graph convolutional networks. In my opinion, graph neural network is just a modeling tool, so I am more from the perspective of using this tool to explain my personal understanding! Welcome to exchange.

Motivation for the introduction of graph neural networks

For the field of graph data (molecular structure diagrams, traffic graphs, etc.), there is nothing to say to study graph neural networks, but why should graph neural networks be introduced in the field of non-graph data?

Motivation 1: Introduce the relationship between samples and samples

We know that the traditional neural network works basically on a sample-by-sample basis, that is to say:

Given a batch of input samples X m × n X_{m\times n}Xm×n, Where mmm represents the size of the batch,nnn represents the dimension of the input sample. Enter theiii lineX i ∗ X_ {i *}XiIndicates the i-th sample to be processed. Remember the neural network as the function fff , then for each sample inside, calculate and process a resultyi = f (X i ∗) y_{i}=f(X_{i*})andi=f(Xi) f f Inside f involves matrix multiplication operations, nonlinear mapping and other processes.

There are many examples of model-by-sample data processing, such as image-by-picture recognition in face recognition, sentence-by-sentence translation in machine translation, and domain-by-domain recognition in malicious domain detection.

However, as we conduct in-depth research in various fields, we have discovered that there is actually a certain potential relationship between samples in some scenarios. The sample-by-sample processing method actually implicitly assumes that the samples are independent of each other. This method discards the potential correlation between the samples.

Therefore, we want to quantify the relationship between the sample and the sample, and at the same time explicitly introduce the relationship into the model.

Motivation 2: Non-linearly express the potential relationship between the internal elements in the sequence sample

Another motivation is to more fully model the relationship between the internal elements of the sequence data.

For example, in recent years, many scholars have used LSTM networks to do machine translation. Commonly used models are traditional attention models or Transformer models. Each sentence is a sequence of words, and the input of these models is a source sentence (for example, in Chinese -> English translation, the source language refers to Chinese). When the attention mechanism is added, the model itself can of course also learn some potential associations between different words in the sentence. For example, the multi-headed self-attention mechanism, when the sentence itself calculates the self-attention, it will produce the correlation scoring between words.

The problem with this approach is that we place too much hope on the neural network, thinking it has the ability to learn the relationship between words by itself. Although neural network has this potential, its divergence ability is too strong. It takes a very long time to get on the right track to find the potential relationship between words.

If we rely on expert knowledge to explicitly know that there are certain relationships between words in a sentence, and these relationships are very helpful to improve the performance of the model. Then we thought that if you directly tell the neural network: [In the input I gave you, there is a relationship between words and words, this relationship is helpful for your classification], then the neural network can continue directly on this basis Forward.
For example: the
following paper:

@article{
    
    marcheggiani2018exploiting,
  title={
    
    Exploiting semantics in neural machine translation with graph convolutional networks},
  author={
    
    Marcheggiani, Diego and Bastings, Joost and Titov, Ivan},
  journal={
    
    arXiv preprint arXiv:1804.08313},
  year={
    
    2018}
}

Just add the dependency of the verb into the machine translation model.
Insert picture description here

If it is not for the improvement of the graph neural network itself, then the main research point is to consider the reflection of the above two motivations in specific scenarios.

With these motivations in place, we will then introduce the basic principles of graph neural networks in order to better use this tool.

Guess you like

Origin blog.csdn.net/jmh1996/article/details/109350393