PyG has built-in many commonly used data sets, such as the classic task data sets such as , etc. If you need data sets in other papers, you can refer to the website Cora
TuDataset , which contains many data sets for graph tasks. download.ENZYMES
But many times these data are not suitable for us, we need to encapsulate our own data set into the Data class of PyG, then we need to customize the data class of PyG, for the data instance class in PyG torch_geometric.data.Data
.
There are some common graph data attributes in this class, as follows:
- data.x: The feature matrix of the node, the shape is [num_nodes, num_node_features]
- data.edge_index: the edge of the graph, the shape is [2, num_edges]
- data.edge_attr: The feature matrix of the edge, the shape is [num_edges, num_edge_features]
- data.y: The label corresponding to the graph, which varies according to different tasks. For node tasks, the shape is [num_nodes, *]; if it is a graph-level task, the shape is [1, *]
- data.pos: The position matrix of nodes, often used for visualization, the shape is [num_nodes, num_dimensions]
If we create our own graph data set, we need to create the properties of the above graph according to our own data set, but these properties are not necessary, and can be selected according to our own tasks
Assuming our graph is as follows, we use the following code to create the graph