例如Wikipedia
数据集:https://snap.stanford.edu/node2vec/POS.mat
得到的数据集即:POS.mat
文件,我们需要对该文件进行读取。
mat_path = '../data_procress/ppi.mat'
load_mat = scio.loadmat(mat_path) # Load MATLAB file.
输出load_mat
为:
{
'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Thu Nov 12 18:54:12 2015', '__version__': '1.0', '__globals__': [], 'group': <3890x50 sparse matrix of type '<class 'numpy.float64'>'
with 6640 stored elements in Compressed Sparse Column format>, 'network': <3890x3890 sparse matrix of type '<class 'numpy.float64'>'
with 76584 stored elements in Compressed Sparse Column format>}
可以看见该字典中有network
这个key
,故而不妨读取该数据network = load_mat['network']
得到<class 'scipy.sparse.csc.csc_matrix'>
类型的数据,然后查看官网
toarray(self[, order, out]) # Return a dense ndarray representation of this matrix.
通过network = load_mat['network'].toarray()
可以得到ndarray
的数据,也就是网络的邻接矩阵表示。
然后就可以使用networkx
进行读取操作。如:
完整代码:
import scipy.io as scio
import networkx as nx
def load_ppi_dataset():
mat_path = '../data_procress/ppi.mat'
load_mat = scio.loadmat(mat_path) # Load MATLAB file.
network = load_mat['network'].toarray()
g = nx.from_numpy_matrix(network)
return g
if __name__ == '__main__':
g = load_ppi_dataset()
print(list(g.nodes))