[Stanford CS224W Notes 2] Feature Engineering of Traditional Graph Machine Learning - Node

Traditional Methods for ML on Graphs

I made notes based on the Chinese explanation of Tongji Zihao’s senior. If you are interested, you can go directly to station b to watch the detailed video:
conveyor belt:
https://github.com/TommyZihao/zihao_course/blob/main/CS224W/1-Intro .md

Graph data mining can be divided into node level, connection level and whole graph level

Graph Machine Learning Basic Tasks

Possible options

Nodes, connections, subgraphs, and full graphs can all have features

  • Weight (e.g. , frequency of communication )
  • Ranking (best friend, second best friend …)
  • Type ( friend, relative, co-worker )
  • Sign : Friend vs. Foe, Trust vs. Distrust
  • Properties depending on the structure of the rest of the graph: Number of common friends
    Multimodal Features: Image, Video, Text, Audio

Your own characteristics are called attribute characteristics
income, education, age, marital status, work unit, credit information

This section focuses on the function of a node in the graph. Whether it is a bridge, a hub, or an edge node, it is more inclined to what role the node plays in the community.

Feature Design :
Using effective features over graphs is the key to achieving good model performance.
Tradtional ML pipeline uses hand-designed features. Artificially constructed features (feature engineering)
we can use artificially designed feature vectors and input the vectors into machine learning features (Feature Engineering)
Goal : Make predictions for a set of subjects.
Design choices:
Features: d-dimensional vectors
Objects :Nodes, edges, sets of nodes, entire graphs
Objective function: What task are we aiming to solve?

Feature engineering at the node level (Node-Level Tasks)

Process: Input the D-dimensional vector of a certain node, and output the probability that the node is a certain type (as shown in the figure). The key is to construct the D-dimensional vector well, and the quality must be high enough to classify the nodes, and guess the unknown from the known graph
Please add a picture description
. The node classification problem for semi-supervised learning (semi-supervised learning)

Node-Level Tasks  example

  • Goal : Characterize the structure and position of a node in the network:
    • Node degree (the number of connections of nodes, only the quantity, not the quality )
    • Node centrality (the importance of the node)
    • Clustering coefficient
    • Graphlets (define subgraph patterns)

Node Degree

Node Degree

Node Centrality

Degree Centrality
Importance is actually the quality of nodes.
Node centrality cv takes the node importance in a graph into account.
Node importance can be divided into other categories:

  • Eigenvector centrality
  • Betweenness centrality
  • Closeness centrality
  • and many other…

Eigenvector centrality :

Eigenvector centrality
The importance of a node is equal to the sum of the importance of its adjacent nodes. It is a recursive problem (recursive manner) that
can use matrix operations, which is actually equivalent to finding the eigenvector and eigenvalue of the adjacency matrix.
Rewrite the recursive equation in the matrix form.
A: Adjacency matrix
C: Centrality vector
Eigenvector centrality

Betweenness centrality

It can be used to judge whether a node is in a traffic throat and must pass.
A node is important if it lies on many shortest paths between other nodes.
Calculate the shortest distance of each pair (as shown in the figure)
Betweenness centrality

Closenesss centrality

A node
is important if it has small shortest path lengths to all other nodes.
Closeness centrality

Clustering Coefficient

Clustering Coefficient

Clustering coefficient (how many clusters)
counts the number of triangles
The clustering coefficients of different nodes are different, find the connection between nodes,
indicating that the connection relationship between nodes is close
ego-network (self-centered network)
triangle means that we have defined a subgraph in advance , we can also define others, such as graphlet

Graphlet

graphlet

It can be regarded as isomers.
Different nodes play different node roles.
We extract the number of graphlet subgraphs around a certain node and construct a vector called Graphlet Degree Vector, which
can describe the topology of the node and compare the two The GDV vector of the node can calculate the distance and similarity

Graph Degress Vector

Analogy

  • Degree count #(edges) that a node touches
  • Clustering coefficient counts #(triangles) that a node touches.
  • Graphlet Degree Vector(GDV): Graphlet-base features for nodes
    • GDV count(#graphlets) that a node touches

Guess you like

Origin blog.csdn.net/m0_51377238/article/details/129748725