Figure convolution network in the end how to do - small scale chopper

GCN is a kind of very powerful neural network architecture for map data. In fact, it is very strong, even if the two random initialization can also generate useful feature GCN FIG characterized nodes in the network. The following figure shows a two-dimensional characterization of such layers each node GCN generated. Please note that, even without any training, it is possible to save these two-dimensional characterization of the relative proximity of nodes in the diagram.

More formally, FIG convolutional network (GCN) is a neural network operating on the data in FIG. Given a graph G = (V, E), GCN inputs are:

 

  • Wherein an input dimension N × F⁰ matrix X, where N is the number of nodes in the network of FIG F⁰ is the number of the input characteristics of each node.

  • FIG dimensional structure is a matrix representation of N × N, e.g. adjacency graph G matrix A. [1]

 

Thus, in the hidden layer may be written GCN Hⁱ = f (Hⁱ⁻¹, A)). Wherein, H⁰ = X, f is a propagation rule [1]. Each hidden layer Hⁱ corresponds to a dimension of the matrix is ​​characterized by the N × Fⁱ, each row of the matrix is ​​characterized by the features of a node. In each layer, using propagation rules f GCN will aggregate the information, so as to form the lower layer characteristics. Thus, the features in each successive layer becomes more abstract. In this framework, GCN various variants differ only in the choice of f propagation rules.

 

A simple example of propagation rules

 

Next, the article presents a simple example of propagation rule [1]:

 

f (Hⁱ, A) = σ (AHⁱWⁱ)

 

Wherein, Wⁱ weight of the i layer is the weight matrix, σ is a nonlinear activation function (e.g.  ReLU  function). Weight matrix dimension Fⁱ × Fⁱ⁺¹, i.e. the weighting matrix determines the size of the second dimension of the characteristic number of the next layer. If you are familiar with convolutional neural network, then you will find that as these weights are shared between the nodes in the graph, the operation is similar to the convolution kernel filtering operation.

 

simplify

 

Next, we study the propagation rule on the simplest level. make:

 

  • i = 1, (f is a function of the constraint applied to the input feature matrix)

  • σ is the identity function

  • Weights are chosen (constraint: AH⁰W⁰ = AXW⁰ = AX)

 

In other words, f (X, A) = AX. The propagation rules may be too simple, later in this article will complement the missing part. In addition, AX is equivalent to the MLP input layer.

 

Simple example of FIG.

 

We will use the following chart as a simple example:

 

Numpy prepared using the following matrix representation of the abutment of FIG:

A = np.matrix([
    [0, 1, 0, 0],
    [0, 0, 1, 1], 
    [0, 1, 0, 0],
    [1, 0, 1, 0]],
    dtype=float
)

 

Next, we need to extract features! We generated two integers for which the index of each node based on the feature, which simplifies the verification process manually matrix operations later herein.

In [3]: X = np.matrix([
            [i, -i]
            for i in range(A.shape[0])
        ], dtype=float)
        X
Out[3]: matrix([
           [ 0.,  0.],
           [ 1., -1.],
           [ 2., -2.],
           [ 3., -3.]
        ])

 

Application propagation rules

 

We have now established a graph, which is the adjacency matrix A, the input feature set is X. Let's take a look at what we can when they occur after propagation rules apply:

In [6]: A * X
Out[6]: matrix([
            [ 1., -1.],
            [ 5., -5.],
            [ 1., -1.],
            [ 2., -2.]]

Characterization of each node (each line) is now characterized by its neighbors and! In other words, FIG convolution layers each node represents a neighboring node for the polymerization. You can verify this calculation yourself. Note that, in this case, if there is the edge from v to n, the node n is a node v neighbors.

 

 

problem

 

You may have found the problem which:

 

  • Characterization of the polymerization node does not contain its own identity! The characterizing features of the neighboring node is polymerized, thus having only the loopback (self-loop) will comprise its own node characterized in that the polymerization in [1].

  • Node having a degree larger value in the further characterization, a small degree of the node having a smaller value. This may cause an explosion or gradient gradient disappears [1, 2], will also affect the stochastic gradient descent algorithm (stochastic gradient descent algorithm is generally used for training such a network, and the size (or range of values ​​for each input feature) very sensitive).

 

Next, we will discuss each of these issues.

 

Increase self-loop

 

To solve the first problem, we can add directly to each node of a self-loop [1, 2]. Specifically, this can be before applying the adjacency matrix A propagation rule and adding it to the unit matrix I.

In [4]: I = np.matrix(np.eye(A.shape[0]))
        I
Out[4]: matrix([
            [1., 0., 0., 0.],
            [0., 1., 0., 0.],
            [0., 0., 1., 0.],
            [0., 0., 0., 1.]
        ])
In [8]: A_hat = A + I
        A_hat * X
Out[8]: matrix([
            [ 1., -1.],
            [ 6., -6.],
            [ 3., -3.],
            [ 5., -5.]])

 

Now, since each node are their own neighbors, each node features summation of neighboring nodes will also include its own identity!

 

To characterize be normalized

 

By multiplying the inverse matrix adjacency matrix A and D, and transform them, so that by the node degree characterize normalized. Therefore, propagation rules after we simplified as follows:

 

f(X, A) = D⁻¹AX

 

Let's see what happens. We first calculate the matrix nodes.

In [9]: D = np.array(np.sum(A, axis=0))[0]
        D = np.matrix(np.diag(D))
        D
Out[9]: matrix([
            [1., 0., 0., 0.],
            [0., 2., 0., 0.],
            [0., 0., 2., 0.],
            [0., 0., 0., 1.]
        ])

 

Before applying propagation rules, take a look at what happens after we transform the adjacency matrix.

 

Before conversion

A = np.matrix([
    [0, 1, 0, 0],
    [0, 0, 1, 1], 
    [0, 1, 0, 0],
    [1, 0, 1, 0]],
    dtype=float
)

After transformation

In [10]: D**-1 * A
Out[10]: matrix([
             [0. , 1. , 0. , 0. ],
             [0. , 0. , 0.5, 0.5],
             [0. , 0.5, 0. , 0. ],
             [0.5, 0. , 0.5, 0. ]
])

It can be observed, the adjacency matrix for each row in the right weight (value) of the line divided by the corresponding node. Next, we transformed adjacency matrix propagation rules apply:

In [11]: D**-1 * A * X
Out[11]: matrix([
             [ 1. , -1. ],
             [ 2.5, -2.5],
             [ 0.5, -0.5],
             [ 2. , -2. ]
         ])

 

Characterization node to obtain the mean characteristics corresponding to adjacent nodes. This is because the (transformed) right adjacency matrix corresponding to an adjacent node weight and the weight of the weighted characteristics. You can verify the results yourself.

 

Integrate

 

Now, since we will ring and normalization techniques together. In addition, we will also re-introduce the relevant weights and omitted to simplify the discussion before the activation function operation.

 

Adding weight

 

First thing to do is to use weights. Note, D_hat here is A_hat = A + I corresponding to the matrix, i.e. the matrix A having a forced from the matrix ring.

In [45]: W = np.matrix([
             [1, -1],
             [-1, 1]
         ])
         D_hat**-1 * A_hat * X * W
Out[45]: matrix([
            [ 1., -1.],
            [ 4., -4.],
            [ 2., -2.],
            [ 5., -5.]
        ])

If we want to reduce the output characterize dimension, we can reduce the size of the weight matrix W's:

In [46]: W = np.matrix([
             [1],
             [-1]
         ])
         D_hat**-1 * A_hat * X * W
Out[46]: matrix([[1.],
        [4.],
        [2.],
        [5.]]
)

Adding activation function

Characterize selected article holding dimension, and applying ReLU activation function.

In [51]: W = np.matrix([
             [1, -1],
             [-1, 1]
         ])
         relu(D_hat**-1 * A_hat * X * W)
Out[51]: matrix([[1., 0.],
        [4., 0.],
        [2., 0.],
        [5., 0.]])

 

This is an adjacency matrix with the input characteristics, weights and the activation function of the hidden layer intact!

 

Application in a real scene

 

Finally, we will map network applications to the convolution on a real map. This article shows how to generate the reader features mentioned above characterization.

 

Zachary Karate Club

 

Zachary Karate Club is a widely used social network, where the nodes on behalf of the members of the karate club, edges represent the relationship between members. Year, Zachary studying karate club when administrators and teachers clashed, leading the club into two. The figure below shows a diagram characterizing the network, where the nodes are labeled according to which part of the nodes that belong to the club obtained, "A" and "I" respectively belong to the site administrator and faculty camp.

 

 

Construction of GCN

 

Next, we will build a network diagram convolution. We do not really train the network, but it will be a simple random initialization to generate the characterizing features we see the beginning of this article. We will use networkx, map characterizes Zachary Karate Club which has an easily achievable. We will then calculate A_hat and D_hat matrix.

 

from networkx import to_numpy_matrix
zkc = karate_club_graph()
order = sorted(list(zkc.nodes()))
A = to_numpy_matrix(zkc, nodelist=order)
I = np.eye(zkc.number_of_nodes())
A_hat = A + I
D_hat = np.array(np.sum(A_hat, axis=0))[0]
D_hat = np.matrix(np.diag(D_hat))

 

Next, we will randomly initialized weights.

W_1 = np.random.normal(
    loc=0, scale=1, size=(zkc.number_of_nodes(), 4))
W_2 = np.random.normal(
    loc=0, size=(W_1.shape[1], 2))

 

Then, we will GCN stacked layers. Here we characterize used only as a matrix, i.e., each node is represented as a one-hot encoding categorical variables.

def gcn_layer(A_hat, D_hat, X, W):
    return relu(D_hat**-1 * A_hat * X * W)
H_1 = gcn_layer(A_hat, D_hat, I, W_1)
H_2 = gcn_layer(A_hat, D_hat, H_1, W_2)
output = H_2

 

We further characterize the extracted features.

feature_representations = {
    node: np.array(output)[node] 
    for node in zkc.nodes()}

 

You see, this feature can be well characterized by Zachary Karate Club will divide the two communities apart. So far, we have not even started training model!

We should note that in this example the action ReLU function, the x or y axis is randomly initialized weights may thus require repeated several times in order to generate a random initialized to 0 in the above FIG.

 

Epilogue

 

FIG herein convolutional networks were introduced farsighted, and characterized in the each layer is characterized in GCN node based on how its neighbors polymerization constructed. Readers can learn how to use numpy to build these networks, as well as their strong: Even random initialization of GCN can also Zachary Karate Club network of community separated.

 

 

 

 

 

 

Published 25 original articles · won praise 8 · views 4436

Guess you like

Origin blog.csdn.net/weixin_42414405/article/details/104758643