[Deep Learning] Graph-Based Machine Learning: An Overview

1. Description

        Graph Neural Networks (GNNs) are gaining attention in data science and machine learning, but are still poorly understood outside expert circles. To grasp this exciting approach, we must start with the broader field of graph machine learning (GML). Many online resources talk about GNN and GML as if they are interchangeable concepts, or as if GNN is a panacea that renders other GML methods obsolete. But in fact, it's not. One of the main purposes of GML is to compress large sparse graph data structures for feasible prediction and inference. GNNs are one way to do this, perhaps the most advanced, but not the only way. Understanding this will help lay a better foundation for the subsequent parts of this series, where we will cover specific types of GNNs and related GML methods in more detail.

        In this article, we will:

  • A brief review of graph data structures
  • Covers GML tasks and the types of problems they solve
  • Investigate the concept of compression and its importance in driving different GML methods, including GNNs

2. What is a graph?

        If you're reading this, you probably already have some knowledge of graph data structures. If not, I recommend reading this resource on property graphs or this resource on graph database concepts . I'll do a very brief recap here :

        A graph consists of nodes connected by relationships. There are several different ways to model graph data . For simplicity, I'll use the property graph model, which has three main components:

  1. nodes representing entities (sometimes called vertices) ,
  2. Relationships ( sometimes called edges or links) that represent associations or interactions between nodes , and
  3. A property representing an attribute of a node or relationship.

Image source: author

3. What is Graph Machine Learning (GML)?

        At its core, graph machine learning (GML) is the application of machine learning to graphs specialized for predictive and prescriptive tasks. GML has various use cases in supply chain, fraud detection, recommendation, customer 360, drug discovery and more.

        One of the best ways to understand GML is through the different types of ML tasks it can accomplish. I break this down into supervised and unsupervised learning below .

3.1 Supervised GML task

        The following figure outlines the three most common GML tasks in supervised learning:

Image source: author

To expand further:

  1. Node attribute prediction:  Predict discrete or continuous node attributes. One can think of node attribute predictions as predicting adjectives about things , such as whether an account on a financial services platform should be classified as fraudulent or how to classify products in an online retail store.
  2. Link Prediction: Predict whether a relationship exists between two nodes, and potentially some properties about that relationship. Link prediction is helpful for applications like entity resolution, where we want to predict whether two nodes reflect the same underlying entity; recommender systems, where we want to predict what a user will want to buy or interact with next; and bioinformatics for predicting protein and drug interactions, etc. For each case, we are concerned with predicting associations, similarities, or potential actions or interactions between entities .
  3. Graph Property Prediction: Predict discrete or continuous properties of graphs or subgraphs. Graph attribute prediction is useful in domains where you want to model each entity as a single graph for prediction purposes, rather than modeling entities as nodes in a larger graph representing the full dataset. Use cases include materials science, bioinformatics, and drug discovery, where a single graph can represent the molecule or protein you want to make a prediction about.

3.2 Unsupervised GML tasks

        Here are the four most common GML tasks for unsupervised learning:

Image source: author

Elaborating on these further:

  1. Representation Learning : Dimensionality reduction while preserving important signals is a central theme of GML applications. Graph representation learning does this explicitly by generating low-dimensional features from graph structures, often for downstream exploratory data analysis (EDA) and ML.
  2. Community detection (relational clustering): Community detection is a clustering technique used to identify groups of densely interconnected nodes in a graph. Community detection has various practical applications in anomaly detection, fraud and investigative analysis, social network analysis, and biology.
  3. Similarity Similarity in GML refers to finding and measuring similar pairs of nodes in a graph. Similarity applies to many use cases, including recommendation, entity resolution, and anomaly and fraud detection. Common similarity techniques include node similarity algorithms , topological link prediction , and  K-nearest Nebor (KNN).
  4. Centrality and Pathfinding: I group these together because they tend to be less associated with ML tasks and more associated with analytics metrics. However, they still technically fit here, so for the sake of completeness I'll cover them. Centrality finds important or influential nodes in a graph. Centrality is ubiquitous in many use cases, including fraud and anomaly detection, recommendations, supply chain, logistics, and infrastructure issues. Route finding is used to find the lowest cost route in a graph or to evaluate the quality and availability of routes. Wayfinding can benefit many use cases that deal with physical systems, such as logistics, supply chain, transportation, and infrastructure.

4. How Compression Became Key to GML

        I came across this interesting blog post by Matt Ranger , which explains it well: One of the most important goals of GML, and to a large extent NLP, is to compress large sparse data structures while maintaining important prediction and inference signals.

        Consider a regular graph represented by an adjacency matrix , a square matrix where each row and column represents a node. The cell at the intersection of row A and column B is 1 if the relationship is from node A to node B; otherwise, 0. Below are illustrations of some small regular graphs and their adjacency matrices.

Image source: author

        Note that many of the cells in the above adjacency matrix are 0. If you scale this to large graphs, especially those found in real applications, the proportion of zeros increases and the adjacency matrix becomes mostly zero.

        Illustrative example of Last.fm created using recommendation graph visuals from Large Graph Visualization Tools and Methods and matrix images from Beck, Fabian et al. Identify modularity patterns through visual comparison of multiple hierarchies

This happens because mean- degree centrality grows more slowly or not at all         as these graphs grow . In social networks, this is demonstrated by concepts such as Dunbar cognitive limit on the number of people with whom a stable social relationship can be maintained. You can also visualize other types of graphs, such as financial transaction graphs or user purchase graphs for recommender systems. As these graphs grow, the number of potentially unique deals or purchases a person can engage in grows far faster than their ability to do so. That is, if there are six products on a site, it is feasible for a user to buy half of them, but not so much if there are hundreds of thousands of them. As a result, you end up with very large and sparse data structures.

        If you can directly use these sparse data structures for machine learning, you don't need GNNs or any GML - you can just plug them into traditional ML models as features. However, this is not possible. It doesn't scale, or even go beyond that, and it also leads to mathematical problems around convergence and estimation, making ML models ambiguous and infeasible. Thus, the fundamental key to GML is to compress these data structures; arguably, this is the whole point of GML.

5. How to complete the compression? — Graph machine learning methods

        At the highest level, there are three GML methods to achieve this compression.

Image source: author

Classical Graph Algorithms

        Classic graph algorithms include PageRank , Louvain , and Dijkstra's shortest paths . They can be used independently for unsupervised community detection, similarity, centrality or pathfinding. The results of classical algorithms can also be used as features for traditional downstream models, such as linear and logistic regression, random forests, or neural networks performing GML tasks.

        Classical graph algorithms tend to be simple, easy to learn, and relatively interpretable and explainable. However, they may require more manual work and subject matter expertise (SME) than other GML methods. This makes classical graph algorithms the first choice in experimentation and development to help understand what works best on your graph. They can also do fine in production for simpler problems, but more complex use cases may require an upgrade to another GML approach.

Non-GNN graph embedding

        Graph embedding is a form of representation learning. Some graph embedding techniques leverage the GNN architecture, while others do not. The latter group, i.e. non-GNNs, is the focus of this approach. Instead, these embedding techniques rely on matrix factorization/decomposition, random projection, random walk, or hash function architectures. Some examples include Node2vec (random walk based), FastRP (random projection and matrix operations) and HashGNN (hash function architecture).

        Graph embedding involves generating numeric or binary feature vectors to represent nodes, relationships, paths, or entire graphs. The most important of these is node embedding, which is the most basic and commonly used. The basic idea is to generate a vector for each node such that the similarity (e.g. dot product) between vectors approximates the similarity between nodes in the graph. Below is an illustrative example of a small network of Zachary karate clubs. Note how the adjacency matrix is ​​compressed into a 2D embedding vector for each node, and how these vectors are clustered together to reflect the graph community structure. Most real-world embeddings will have more than two dimensions (2 to 128 or higher) to represent larger real-world graphs with millions or billions of nodes, but the basic intuition is the same.

Image source: author

The same logic as above applies to relations, paths, and entire graph embeddings: the similarity in the embedding vectors should approximate the similarity in the graph structure. This enables compression while preserving important signals, making embeddings useful for a variety of downstream ML tasks.

Non-GNN embeddings can benefit from reduced manual effort and required SME compared to traditional graph algorithms. While non-GNN embeddings often require hyperparameter tuning to get right, they tend to be easier to automate and generalize across different graphs. Furthermore, some non-GNN embeddings, such as FastRP and HashGNN , scale well to large graphs on commodity hardware since they do not require model training. This can be a huge benefit compared to GNN based methods.

However, non-GNN embeddings also have some trade-offs. Due to the broader mathematical operations involved, they are less interpretable and explainable than classical graph algorithms. They are also generally inductive , although recent improvements in Neo4j graph data science allow some of them to behave effectively inductively in certain applications. We will cover the transduction and induction setting in more depth later in this series; it is related to the ability to predict new unseen data and is an important consideration for GML.

Graph Neural Network (GNN)

Schematic diagram of graph network

         GNN is a neural network model that takes graph data as input, converts it into intermediate embeddings, and feeds the embeddings to the final layer aligned with the prediction task. This prediction task can be supervised (node ​​attribute prediction, link prediction, graph attribute prediction) or unsupervised (clustering, similarity or just the final output embedding for representation learning). Therefore, unlike classical algorithms and non-GNN embeddings, which pass the results as features to downstream ML models, especially for supervised tasks, GNNs are fully end-to-end graph-native solutions.

        GNNs have various benefits associated with a complete end-to-end solution. It is worth noting that the intermediate embedding is learned during training, and in theory, it automatically infers the most important information from the graph. State-of-the-art GNNs are also inductive due to the trained model.

        GNNs also have some weaknesses. This includes high complexity, difficulty scaling, and low explainability and interpretability. GNNs can also be limited in depth due to oversmoothing and other mathematical principles.

I'll talk more about GNNs         in my next blog post GNNs: What They Are and Why They Matter . Meanwhile, if you want to start learning about graph machine learning, check out Neo4j Graph Data Science . Data scientists and engineers can find getting started technical documentation here .

6. To sum up

        Biggest takeaway from this article:

  • Graph Machine Learning (GML) is a broad field with many use-case applications encompassing several different supervised and unsupervised ML tasks
  • One of the main purposes of GML is to compress large sparse graph structures while maintaining important signals for prediction and inference.
  • GNN is one of several GML methods to achieve this compression.
References and Citations

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/131950227