Graph Convolutional Networks for Text Classification original code interpretation [tensorflow]

project address

https://github.com/yao8839836/text_gcn

Environment configuration

python 3.6

20ng article sample

From: [email protected] (dean.kaflowitz) Subject: Re: about the bible quiz answers Organization: AT&T Distribution: na Lines: 18 In article [email protected], [email protected] (Tammy R Healy) writes: > > > #12) The 2 cheribums are on the Ark of the Covenant. When God said make no > graven image, he was refering to idols, which were created to be worshipped. > The Ark of the Covenant wasn’t wrodhipped and only the high priest could > enter the Holy of Holies where it was kept once a year, on the Day of > Atonement. I am not familiar with, or knowledgeable about the original language, but I believe there is a word for “idol” and that the translator would have used the word “idol” instead of “graven image” had the original said “idol.” So I think you’re wrong here, but then again I could be too. I just suggesting a way to determine whether the interpretation you offer is correct. Dean Kaflowitz

python remove_words.py 20ng

dataset = sys.argv[1]: 20ng
statistical word frequency,
filter low-frequency words and
stopwords, write all processed articles into 20ng.clean.txt ('word1 word2 word3 …\n word1 word2 …\n…')
Insert picture description here

python build_graph.py 20ng

doc_train_list[0]:

Insert picture description here

doc_content_list[0]:

Insert picture description here

train_ids_str:

“idx1\nidx2\n…”
Insert picture description here

shuffle_doc_name_str (train above):

“name1\nname2\n…”
Insert picture description here

shuffle_doc_words_str

slightly

word_doc_list *

{word1:[1,2,3,4], word2:[2,3,4,5],word3:[100,203,…]}
indicates that word1 appeared in the first 1, 2, 3, and 4 articles

word_doc_freq

{word1:4,word2:4,...}
indicates that word1 has appeared in 4 articles

word_id_map

{word1:0,word2:1,word3:2…}

vocab_str

Insert picture description here

label_set

Insert picture description here

label_list_str

Insert picture description here

x

row_x( real_train_size x word_embeddings_dim):
[0,0,0,0,…0, 1,1,1,1,…1, 2,2,2,2,…2,…]
300 300 300
col_x( real_train_size x word_embeddings_dim):
[0,1,2,3,…299, 0,1,2,3,…299, …]
data_x(real_train_size x word_embeddings_dim)
x = sp.csr_matrix((data_x, (row_x, col_x)), shape=(
real_train_size, word_embeddings_dim))

Y

[[0,1,0,0,0,…], [1,0,0,0,…],…]

tx

test x

ty

test y

allx(doc+word)

word_vectors:(vocab_size, word_embeddings_dim)
row_allx\col_allx\data_allx is the same as above, except that it now contains all training articles and all vocabs

进一步
row_allx:[0,0,…,1,1,…, train_size-1,train_size-1,…train_size+vocab_size-1, train_size+vocab_size-1,…]
Insert picture description here

ally

[[0,1,0,0,0,…], [1,0,0,0,…],…,[0,0,0,0,…]]
tagged articles | words without tags

Temporary summary

print(x.shape, y.shape, tx.shape, ty.shape, allx.shape, ally.shape)

(10183, 300) (10183, 20) (7532, 300) (7532, 20) (54071, 300) (54071, 20)

windows

window_size = 20
[[w1,w2,w3,…w14], [w1,w2,…w20],…]
If an article has only 14 words, then directly join the window;
otherwise, the window will be in the article in steps of 1. Swipe up. That is, length-length articles can generate length-window_size+1 windows

word_window_freq

{word1:freq1,word2:freq2,word3:freq3}
word1 appeared in different windows of freq1

word_pair_count

{'1498,2066':3,'2066,1498':3,...}
Two-way graph, the 1498th word and the 2066th word in vocab appear together 3 times in all windows

I feel that the count here is different from the word freq above

  • The number of times a word of word_window_freq refers to the number of windows in which the word has appeared
  • The number of times a word group of word_pair_count refers to the total number of pairs that appear in all windows, that is, there may be multiple pairs in a window

PMI(word word)

Consider a word pair, word1 is the i-th in the vocabulary and word2 is the j-th
row in the vocabulary :
[train_size+i, …]
col
[train_size+j,…]
weight
[pmi_i_j]
Thinking about other places ? train_size x train_size inside?

doc_word_freq

{'doc_id1,word_id1':3,...}
indicates that in the article corresponding to doc_id1, the word corresponding to word_id1 appears 3 times

TF-IDF(doc word)

承接上面的问题
row:
[train_size, train_size+i…train_size+vocab_size-1, | 0, 1,2,…train_size-1, | train_size + vocab_size + i,… train_size + vocab_size + test_size]
adj = sp.csr_matrix(
(weight, (row, col)), shape=(node_size, node_size))

python train.py 20ng

load_corpus

  • adj: I don’t understand at all: adj = adj + adj.T.multiply(adj.T> adj)-adj.multiply(adj.T> adj), the original adj adds the corresponding element that the transpose of adj is larger than adj , Minus the corresponding element that adj is smaller than the transpose of adj
  • features: (train_size(doc) + vocab_size + test_size) x 300
  • y_train: (train_size(doc) + vocab_size + test_size) x label_num, but only the place where 1 in [1,1,1(real_train_size) 0,0,0...] has a label, the others are unlabeled
  • y_val:[0,0,0(real_train_size), 1,1,1(val_size), 0,0,0…]
  • y_test:[0,0,0(real_train_size),0,0,0(val_size),0,0,0(vocab_size),1,1,1(test_size)]
  • train_mask
  • val_mask
  • test_mask
  • train_size
  • test_size

features = sp.identity(features.shape[0])

The following code takes the input features=sp.identity(3) as an example.

import scipy.sparse as sp
import numpy as np
def preprocess_features(features):
	(features)
	(0, 0)	1.0
	(1, 1)	1.0
	(2, 2)	1.0
    """Row-normalize feature matrix and convert to tuple representation"""
    rowsum = np.array(features.sum(1)) # 每一行的和
    print("rowsum",rowsum)
    (rowsum)
    [[1.]
	 [1.]
	 [1.]]
    r_inv = np.power(rowsum, -1).flatten() # 每一行的行的倒数,一行上所有元素乘上和的倒数,做的就是归一化
    print("np.power(rowsum, -1)",np.power(rowsum, -1))
    (r_inv)
    [1. 1. 1.]
    r_inv[np.isinf(r_inv)] = 0.  # 把无穷大无穷小的值都变为0
    r_mat_inv = sp.diags(r_inv) # 以每一行的和的倒数为对角元素,创建矩阵,用于与原来矩阵进行相乘
    print("r_mat_inv",r_mat_inv)
    (r_mat_inv)
    (0, 0)	1.0
	(1, 1)	1.0
	(2, 2)	1.0
    features = r_mat_inv.dot(features) # 矩阵乘法,做归一化
    print("feature", features)
    (features)
    (0, 0)	1.0
    (1, 1)	1.0
    (2, 2)	1.0
    return sparse_to_tuple(features)

def sparse_to_tuple(sparse_mx):
    """Convert sparse matrix to tuple representation."""
    def to_tuple(mx):
        if not sp.isspmatrix_coo(mx):
            mx = mx.tocoo()
        coords = np.vstack((mx.row, mx.col)).transpose()
        values = mx.data
        shape = mx.shape
        return coords, values, shape # 列下标和行下标,对应的元素,矩阵大小

    if isinstance(sparse_mx, list):
        for i in range(len(sparse_mx)):
            sparse_mx[i] = to_tuple(sparse_mx[i])
    else:
        sparse_mx = to_tuple(sparse_mx)
    return sparse_mx

so, preprocess_features(features) finally returns:
Insert picture description here

support = [preprocess_adj(adj)]

def normalize_adj(adj):
    """Symmetrically normalize adjacency matrix."""
    adj = sp.coo_matrix(adj)
    rowsum = np.array(adj.sum(1))
    d_inv_sqrt = np.power(rowsum, -0.5).flatten()
    d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
    d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
    return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()


def preprocess_adj(adj):
    """Preprocessing of adjacency matrix for simple GCN model and conversion to tuple representation."""
    adj_normalized = normalize_adj(adj + sp.eye(adj.shape[0]))
    return sparse_to_tuple(adj_normalized)

What should be done is what is in the paper:
Insert picture description here
(to be read carefully)

placeholders

placeholders = {
    'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],
    'features': tf.sparse_placeholder(tf.float32, shape=tf.constant(features[2], dtype=tf.int64)),
    'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),
    'labels_mask': tf.placeholder(tf.int32),
    'dropout': tf.placeholder_with_default(0., shape=()),
    # helper variable for sparse dropout
    'num_features_nonzero': tf.placeholder(tf.int32)
}

The data that sparse_placeholder needs to feed here is (indices, value, shape), which is the three things returned by preprocess_features

create_model

Two-layer gcn network
self.layers.append(GraphConvolution) x 2

Then build and build the static graph:

        for layer in self.layers:
            hidden = layer(self.activations[-1])
            self.activations.append(hidden)
        self.outputs = self.activations[-1]

Save variables:

        variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
        self.vars = {var.name: var for var in variables}

loss

# 为什么只需要第一层?
 for var in self.layers[0].vars.values():
            self.loss += FLAGS.weight_decay * tf.nn.l2_loss(var)
 # 需要对有标签的进行计算
 def masked_softmax_cross_entropy(preds, labels, mask):
    """Softmax cross-entropy loss with masking."""
    print(preds)
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)
    mask = tf.cast(mask, dtype=tf.float32)
    mask /= tf.reduce_mean(mask)
    loss *= mask
    return tf.reduce_mean(loss)

acc

opt_op

self.opt_op = self.optimizer.minimize(self.loss)

train (calculation process)

First pass the inputs to the first layer:

self.layers.append(GraphConvolution(input_dim=self.input_dim,
                                            output_dim=FLAGS.hidden1, # 200
                                            placeholders=self.placeholders,
                                            act=tf.nn.relu,
                                            dropout=True,
                                            featureless=True,
                                            sparse_inputs=True,
                                            logging=self.logging))

First perform dropout (sparse dropout)

 # dropout
        if self.sparse_inputs:
            x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
        else:
            x = tf.nn.dropout(x, 1-self.dropout)
def sparse_dropout(x, keep_prob, noise_shape):
    """Dropout for sparse tensors."""
    random_tensor = keep_prob # 保留的概率
    random_tensor += tf.random_uniform(noise_shape) # 按照feature中有实质的个数,返回一个(个数,)的向量,每一个元素值是random_uniform出来的,然后加上keep_prob
    dropout_mask = tf.cast(tf.floor(random_tensor), dtype=tf.bool) # 对于random_tensor中的值,如果>1,则为True,反之则为False
    pre_out = tf.sparse_retain(x, dropout_mask) # 保留x中,dropout_mask对应位置是True的元素
    return pre_out * (1./keep_prob) # 保留下来的元素再除以一个 小于1的概率?--->激活?

Then proceed to convolve. The featureless of the first layer is True. The strange thing here is the meaning of featureless. If it is True, then the calculation is only the multiplication of the symmetric adjacency matrix and the weight? No feature? —I learned later that the feature at the beginning was a unit array, and there was no need to participate in the calculation at all...:
Insert picture description here

supports = list()
for i in range(len(self.support)):
    if not self.featureless:
        pre_sup = dot(x, self.vars['weights_' + str(i)],
                      sparse=self.sparse_inputs)
    else:
        pre_sup = self.vars['weights_' + str(i)]
    support = dot(self.support[i], pre_sup, sparse=True)
    supports.append(support)  
output = tf.add_n(supports) # tf.add_n就是把supports列表中的所有support对应元素相加
def dot(x,   y, sparse=False):
    """Wrapper for tf.matmul (sparse vs dense)."""
    if sparse:
        res = tf.sparse_tensor_dense_matmul(x, y)
    else:
        res = tf.matmul(x, y)
    return res

bias&embedding, you can see that embedding is the output of the first layer, and self.act(output) needs to be passed to the next layer

# bias
        if self.bias:
            output += self.vars['bias']
        self.embedding = output #output
        return self.act(output)

Then pass self.act(output) to the second layer:

self.layers.append(GraphConvolution(input_dim=FLAGS.hidden1,
                                    output_dim=self.output_dim,
                                    placeholders=self.placeholders,
                                    act=lambda x: x, 
                                    dropout=True,
                                    logging=self.logging))

Note that featureless=False here.

# Bold style of some weird error

AttributeError: module 'tensorflow' has no attribute 'random_uniform'

Solution:
https://blog.csdn.net/weixin_43763859/article/details/104537392

Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11
I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Solution (download cudatoolkit=11.0):
https://blog.csdn.net/qq_28193019/article/details/103146116

Cannot use GPU when output.shape[1] * nnz(a) > 2^31

Solve
https://blog.csdn.net/weixin_35970195/article/details/112585490

What if it is a GCN larger than the second layer? —At present, it seems that you need to write a few more layers of GCN manually, but Chebyshev below is not.

support = chebyshev_polynomials(adj, FLAGS.max_degree)

reference

TensorFlow函数:tf.sparse_placeholde

Guess you like

Origin blog.csdn.net/jokerxsy/article/details/112076521