Figure Network Spatial Convolution Description 4: PGC

论文地址:Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

1. The core idea

PGC believes that convolution can be regarded as a specific sample function (sample function) and a specific weight function (weight function) multiplied and then summed.
In the paper, this model acts on the task of action recognition based on human skeleton.

Second, traditional convolution comparison

For traditional convolution, a K × KK × KK×The convolution operation of the K convolution kernel can be considered as the following function:
Insert picture description here
whereKKK is the size of the convolution kernel (commonly 3, 5). p () p()p ( ) is a sampling function, which takes the nodes out of the neighborhood in turn for convolution calculation. w () w()w ( ) is a weight function, that is, a parameter for convolution operation on each node.
For example: for3 × 3 3 × 33×Convolution kernel of 3 ,k = 3 k=3k=3 p ( x , 1 , 1 ) p(x,1,1) p(x,1,1 ) is the first point in the upper left corner,w (1, 1) w(1,1)w(1,1 ) Represents the coefficient in the upper left corner of the convolution kernel.
Insert picture description here
The entire formula isthe inner product ofnode featuresandconvolution kernel parameters.

Three, PGC implementation

PGC is to apply the convolution idea thought above to the graph structure. The main process is to take a suitable sampling function and a corresponding weight function .

Sampling function

The sampling function is to take out the nodes sequentially in the neighborhood. The focus is on how to construct the neighborhood of the node , that is, where the sampling function takes samples.

On the graph structure data, PGC can define the sampling function on the D-order neighbor nodes . That is, B (vi) = {vj ∣ d (vj, vi) ≤ D} B\left(v_{i}\right)=\left\{v_{j} \mid d\left(v_{j}, v_ {i}\right) \leq D\right\}B( vi)={ vjd( vj,vi)D } , whered (vj, vi) d(v_j,v_i)d(vj,vi) Means fromiii node tojjThe shortest distance of j node.

In the experiment, take D = 1 D = 1D=1 , and sample one by one in the first-order neighborhood. But it can also be set to other neighborhoods.

Weight function

First divide the points in the sampled neighborhood into KKK different classes.
li: B (vi) → {0,…, K − 1} l_i:B(v_i) \rightarrow\{0,\dots,K-1\}li:B(vi){ 0,,K1}

Where li (⋅) l_i(\cdot)li( ) represents theiiClassification mapping of i nodes. B (vi) B(v_i)B(vi) DenotesiiNeighborhood nodes of i nodes. {0,…, K − 1} \{0,\dots,K-1\}{ 0,,K1 } stands for category.

The same class shares a convolution kernel parameter. Convolution kernels between different classes do not share parameters:
w (vi, vj) = w ′ (li (vj)) w(v_i,v_j) = w'(l_i(v_j))w(vi,vj)=w(li( vj))

表示 v i , v j v_i,v_j vi,vjThe convolution kernel parameter between is equal to the convolution kernel of the neighborhood class to which it belongs.

Classification strategy

( a a a ): Enter the sample frame of the skeleton. The joints of the body are painted with blue dots. D = 1 D = 1D=The receptive field of the filter of 1 is drawn with a red dotted circle. The red node is the center node, and the circle is its first-order neighborhood.

( b b b ):Uni-labeling classification
treats equally, only one category, similar to GNN.

( c c c ):Distance classification is classified
according to the order, the central node is 0 order, and the adjacent nodes are 1 order, which are divided into different classes.

( d d d ):Spatial configuratio classification
According to the distance from the center of the human skeleton, there are three categories. The node distance is less than the distance between the center node and the skeleton center is one type, the node distance is greater than the distance between the center node and the skeleton center is one type, and the center node is one type.

official

The final convolution formula is as follows:
Insert picture description here
Z i (vj) Z_i(v_j)WITHi( vj) Means jjin the neighborhood of the i nodeThe number of nodes in the category of the j node (category =vj v_jvj), used to balance each type of node information.

Contrast thinking

1. Compared with the mean sampling of GraphSAGE, PGC defines a sampling function with stronger generalization.

2. GNN thinks that the nodes need to be sorted, GraphSAGE thinks that there is no need to sort, while PGC uses a weight function, if it is divided into only one category, it is similar to GNN, if it is considered that each node category is different, then it is similar to GraphSAGE, with stronger Generalization.

For example

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_41214679/article/details/110005997