社交网络数据集.mat文件的读取

例如Wikipedia数据集:https://snap.stanford.edu/node2vec/POS.mat

得到的数据集即:POS.mat文件,我们需要对该文件进行读取。

mat_path = '../data_procress/ppi.mat'
load_mat = scio.loadmat(mat_path)  # Load MATLAB file.

输出load_mat为:

{
    
    '__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Thu Nov 12 18:54:12 2015', '__version__': '1.0', '__globals__': [], 'group': <3890x50 sparse matrix of type '<class 'numpy.float64'>'
	with 6640 stored elements in Compressed Sparse Column format>, 'network': <3890x3890 sparse matrix of type '<class 'numpy.float64'>'
	with 76584 stored elements in Compressed Sparse Column format>}

可以看见该字典中有network这个key,故而不妨读取该数据network = load_mat['network']得到<class 'scipy.sparse.csc.csc_matrix'>类型的数据,然后查看官网

toarray(self[, order, out])   # Return a dense ndarray representation of this matrix.

通过network = load_mat['network'].toarray()可以得到ndarray的数据,也就是网络的邻接矩阵表示。
然后就可以使用networkx进行读取操作。如:
完整代码:


import scipy.io as scio
import networkx as nx

def load_ppi_dataset():
    mat_path = '../data_procress/ppi.mat'
    load_mat = scio.loadmat(mat_path)  # Load MATLAB file.
    network = load_mat['network'].toarray()
    g = nx.from_numpy_matrix(network)
    return g

if __name__ == '__main__':
    g = load_ppi_dataset()
    print(list(g.nodes))

猜你喜欢

转载自blog.csdn.net/qq_26460841/article/details/114645015
今日推荐