Geometric Deep Learning Technology

Geometric Deep Learning (Geometric Deep Learning) Technology
Overview
of Geometric Deep Learning Take a look at Geometric Deep Learning from the paper Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges.
insert image description here

https://geometricdeeplearning.com The researchers even built a website on this topic.
Geometric Deep Learning -
Geometric Deep Learning, from the perspective of symmetry and invariance, attempts to unify a large class of machine learning problems.
Therefore, geometric deep learning refers not to a certain algorithm, but to finding a common point among many algorithms and making an overview.
The current state of the field of deep learning (representational learning) is reminiscent of the geometry of the nineteenth century:
on the one hand, in the past decade, deep learning has brought a revolution to data science, making many previously considered impossible Tasks become possible – whether it’s computer vision, speech recognition, natural language translation, or the game of Go. On the other hand, there are now a variety of different neural network architectures for different types of data, but few unifying principles.
Therefore, it is difficult to understand the relationship between different methods.
Find the commonalities of algorithms, use this as a framework, as a kind of thought, to inspire future generations of algorithm structure design.
insert image description here

A geometric prior - scaling, as shown in the figure below, represents a scaling approximation.
insert image description here

f' can be obtained from f through the scaling operation P.
The following figure reflects another geometric invariance, from the point u -> g from the position of "number 3", but the content of the picture does not change (the picture still represents the number 3).
insert image description here

A geometric invariance
According to the characteristics of the above-mentioned geometry, the blueprint of geometric deep learning is obtained. Can be recognized in most popular deep neural architectures for representation learning: a typical design consists of a chain of equivalence layers (such as convolutional layers in CNNs), followed by an invariant global pooling layer that combines Everything is aggregated into one output.
insert image description here

Blueprint of Geometric Deep Learning
With the blueprint, the following is an overview of the characteristics, as shown in
The 5G of Geometric Deep Learning: grids, groups & homogeneous spaces with global symmetry, graphs, geodesics & metrics on manifolds, and gauges (frames for tangent or feature spaces).
insert image description here

5 Features of Geometric Deep Learning
Familiar Convolutional Neural Networks (CNN), Graph Neural Networks (GNN), Recurrent Neural Networks (RNN), etc., can be attributed to this framework by the author.
The purpose is to outline the existing deep learning framework, illustrate the commonalities, and inspire subsequent researchers.
insert image description here

First author MSP paper
In addition, the author proposed geometric deep learning in a 2017 MSP paper, Geometric Deep Learning: Going beyond Euclidean data. So, this is not a completely new concept.
The paper mentions some geometric properties of deep learning algorithms, such as translation invariance, etc., but
does not solve the problem of interpretability of deep learning results.
insert image description here

How to break through the performance bottleneck of GNN based on WL test and message passing mechanism? Let’s see what Michael Brostein, the standard bearer of geometric deep learning and a professor at Oxford University, said.
Compiling OGAI
diagrams can easily abstract complex systems of relationships and interactions. Research fields such as social networks, high-energy physics, and chemistry all involve interacting objects (whether people, particles, or atoms). In these scenarios, the importance of graph-structured data has become increasingly prominent, and related methods have achieved a series of initial successes. A series of industrial applications have made graph deep learning one of the hot research topics in machine learning.
insert image description here

Legend: The relationship and interaction of complex systems are abstracted through diagrams. For example, the chemical bonds of atoms that make up molecules in "molecular graph", the relationship and interaction between users in "social network", the connection between users and products in "recommendation system".
Continuous learning models on physics-inspired graphs can overcome the limitations of traditional GNNs. Message passing has been the dominant paradigm in the field of graph deep learning for many years, enabling graph neural networks (GNNs) to achieve great success in applications ranging from particle physics to protein design.
From a theoretical point of view, a link to the Weisfeiler-Lehman (WL) hierarchy is established, which can be used to analyze the expressiveness of GNNs. However, in Michael Brostein's view, the current "node and edge-centric" way of thinking in graph deep learning solutions brings insurmountable limitations that hinder future development of the field.
On the other hand, in a recent review on geometric deep learning, Brostein proposes a physics-inspired continuous learning model that opens up research on a range of new tools from the fields of differential geometry, algebraic topology, and differential equations. So far, there has been little such research in the field of graph machine learning.
In response to Bronstein's latest thinking, AI Technology Review has done the same sorting and compilation:
1 Working principle of graph neural network The
input of GNN is a graph with node and edge features, and it calculates a graph that depends on both features and graph structure. function. The message-passing class of GNNs (i.e. MPNNs) propagate features on the graph by exchanging information between adjacent nodes. A typical MPNN architecture consists of several propagation layers that update each node based on an aggregation function of neighbor features. According to the different aggregation functions, MPNN can be divided into: convolution (linear combination of neighbor features, weights only depend on the structure of the graph), attention (linear combination, weights depend on the graph structure and features) and message passing ( generalized nonlinear functions). Message-passing GNNs are the most common, and the former can be seen as a special case of message-passing GNNs.
insert image description here

Legend: The three styles of GNNs—convolution, attention, and generalized nonlinear information delivery styles—are all manifestations of message passing.
The propagation layer is composed of parameters learned based on downstream tasks. Typical use cases include: node embedding (each node is represented as a point in the vector space, and the connectivity of the original graph is restored by the distance between the points. Such tasks are called For “link prediction”), node-level classification or regression (e.g. inferring attributes of social network users), or graph-level prediction by further aggregating node features (e.g. predicting chemical properties of molecular graphs).
2. Inadequacies of message passing GNNs GNNs
have achieved impressive success on multiple fronts, and recent related research has considerable breadth and depth. However, the current mainstream model of the graph deep learning paradigm is: for the constructed graph, the node information is propagated along the edges of the graph by means of message passing. According to Michael Brostein, it is this node- and edge-centric way of thinking that presents a major hurdle for further development of the field.
The analogy capabilities of WL are limited. Appropriate selection of local aggregation functions like "sum" can make message passing equivalent to WL graph isomorphism tests, enabling graph neural networks to discover certain graph structures based on how information is propagated on the graph. Through this important connection to graph theory, researchers have presented a variety of theoretical results analyzing the expressiveness of GNNs, determining whether certain functions on the graph can be computed by message passing. Results of this type of analysis usually do not account for the efficiency of the representation (ie, how many layers are needed to compute a certain function), nor the generalization ability of the GNN.
insert image description here

Legend: The WL test is like walking into a maze without a map, trying to understand the structure of the maze. Location coding provides a map of the maze, and reconnection provides a ladder to go over the "wall".
Even for simple graph structures such as triangles, sometimes the WL algorithm fails to detect them, much to the disappointment of practitioners trying to use message-passing neural networks for molecular graphs. For example, in organic chemistry, structures like rings are very common and are important to the properties of molecules (for example, aromatic rings such as naphthalene are called aromatic because they are mainly found in compounds with strong odors).
insert image description here

Legend: Decalin (left) and dicyclopentyl (right) have different structures but cannot be distinguished by the WL test.
In recent years, some methods have been proposed to build more expressive GNN models. For example, high-dimensional isomorphism tests in WL hierarchies (at the cost of higher computational and memory complexity and lack of locality), applying WL tests to sets of subgraphs; location or structure encoding, for nodes in a graph Coloring, in this way, helps break the rules that confuse the WL algorithm. Positional encoding is currently the most common technique in Transformer models and is also widely used in GNNs. Although a variety of position encoding methods exist, the specific choice depends on the target application and requires some experience from the user.
insert image description here

Legend: Position encoding examples: random features, Laplacian feature vectors (similar to sinusoids in Transformer), structural features (number of triangles and rectangles).
"Graph reconnection" breaks through the theoretical basis of GNN. An important and subtle difference between GNNs and Convolutional Neural Networks (CNNs) is that the graph is both part of the input and part of the computational structure. Traditional GNNs use the input graph structure to propagate information, and in this way obtain representations that reflect both the graph structure and the features on the graph. Due to certain structural features ("bottlenecks"), some graphs have poor performance in information dissemination, resulting in information from too many nodes being compressed into one node's powerful capabilities, known as "overcompression".
Modern GNN implementations deal with this phenomenon by coupling the input graph with the computational graph (or optimizing the input graph for computational purposes), a technique called "graph rewiring". Reconnection can take the form of neighborhood sampling, virtual nodes, connectivity diffusion or evolution, or node and edge dropout mechanisms. Transformers and attention-based GNNs like GAT effectively learn new graphs by assigning different weights to each edge, which can be understood as a "soft" reconnection. Finally, latent graph learning methods fall into this category and can build task-specific graphs that are updated in each layer (with positional encoding in the initial state, an initial graph, or sometimes no graph at all). Few modern GNN models propagate information on the original input graph.
insert image description here

Legend: Various graph reconnection techniques used in GNNs - raw graphs, neighborhood sampling (e.g., GraphSAGE), attention mechanisms (e.g., GAT), connectivity evolution (e.g., DIGL).
The WL test describes graphs in terms of how information spreads on the graph. Reconnection breaks through this theoretical connection, but falls back into a common problem in the field of machine learning: The models that are analyzed theoretically in academia are not the same as those used in practice.
Sometimes the "geometric properties" of the graph are insufficient. GNNs are an example in the grand blueprint of geometric deep learning. Geometric deep learning is a "group theory framework" that can design deep learning architectures based on the symmetry of the domain underlying the data. Since graphs do not have a canonical order of nodes, in the context of graphs, this symmetry refers to the arrangement of nodes. Due to this structural property, MPNNs on local action graphs must rely on feature aggregation functions that satisfy permutation invariance. There is no concept of "direction" on the graph, and the propagation of information is isotropic. This situation is significantly different from learning on continuous domains, grids, and is one of the shortcomings of GNNs, where isotropic filters are considered to be of limited use.
insert image description here

Legend: A mesh is a discrete manifold with a local Euclidean structure. Neighboring nodes are defined in terms of rotation, thus forming the concept of "direction". Graphs are less structured and define neighbor nodes in terms of permutations.
Sometimes a graph has too many "geometric properties". The difference in distance and orientation is also somewhat related to the problems encountered when building node embeddings. The distance between node representations in some space, capturing the connectivity of the graph. It is roughly possible to connect nodes that are close in the embedding space by an edge in the graph. In recommender systems, graph embeddings are used to create associations (edges) between entities represented by nodes.
The quality of a graph embedding and its ability to express the graph structure depends to a large extent on the geometric properties of the embedding space and its compatibility with the geometric properties of the graph. Euclidean space plays an important role in representation learning. It is currently the simplest and most convenient representation space, but for many natural graphs, Euclidean space is not ideal. One of the reasons is: Euclidean metric sphere Volume grows polynomially with radius, exponentially with dimension, and many graphs in the real world grow exponentially in volume. As a result, the embeddings become "overcrowded" and forced to use a high-dimensional space, resulting in higher computational and space complexity.
An alternative that has become popular recently is to use negative curvature (hyperbolic) spaces, with exponential volume growth that is more compatible with graphs. The use of hyperbolic geometry generally results in a lower embedding dimension and a more compact node representation. Graphs tend to be heterogeneous (e.g. some parts look like trees, others look like clumps, with very different volume growth properties), hyperbolic embedding spaces are homogeneous (every point has the same geometric properties ).
Even if the embedding space has non-Euclidean geometric properties, it is often impossible to accurately represent the metric structure of a general graph in this space. Therefore, the embedding of the graph is inevitably approximate. To make matters worse, however, since the embeddings are constructed with link prediction criteria in mind, the distortion of higher-order structures (triangles, rectangles, etc.) can be unmanageably large. Such structures play an important role in application scenarios such as social and biological networks to capture more complex unpaired interactions and motifs.
insert image description here

Legend: The motif of a graph is a higher-order structure. This structure can be observed in diagrams that model many biological phenomena.
The performance of GNNs is challenged when the structure of the data is incompatible with the structure of the underlying graph. Many graph learning datasets and comparison benchmarks assume by default that the data is homogeneous (i.e., the features or labels of adjacent nodes are similar, or smooth). In this case, even simple low-pass filtering of the graph (e.g., taking the adjacency average) can work well. Early comparison benchmarks (eg, Cora) were performed on graphs with a high degree of homogeneity, which made the evaluation of GNNs too easy.
insert image description here

Legend: Homogeneous and heterogeneous datasets. In an isomorphic graph, the structure of node features or labels is compatible with the graph (ie, a node is similar to its neighbors).
However, many models show disappointing results when dealing with heterophilic data, in which case finer aggregations must be used. Consider two typical cases: (1) The model completely avoids the use of neighbor information (GNN degenerates into a node-level multilayer perceptron) (2) The phenomenon of “over-smoothing” occurs, that is, the representation of a node after passing through the layers of the GNN becomes smoother and eventually "collapses" to a point. The phenomenon of "over-smoothing" in the kinship dataset is a more essential defect for some MPNNs, making deep graph learning models difficult to implement.
It is often difficult to understand what a GNN learns, and GNNs tend to be hard-to-interpret black-box models. While the definition of interpretability is still largely vague, in most cases it is true that there is no real understanding of what GNNs learn. Some recent works have attempted to alleviate the interpretability deficit by interpreting GNN-based models in the form of compact subgraph structures and subsets of node features that play a key role in GNN prediction. Graphs learned through latent graph learning architectures can also be seen as a form of providing "explanations".
Constraining a generic message-passing function helps exclude unreasonable outputs, ensuring that what the GNN learns makes sense, and the GNN can be better understood in domain-specific applications. Doing so imparts additional "internal" data symmetry to message passing, allowing a better understanding of the underlying problem. For example, E(3)-equivariant messaging, which correctly handles atomic coordinates in molecular graphs, has recently contributed to the success of protein structure prediction architectures such as AlphaFold and RosettaFold.
In the paper "Discovering symbolic models from deep learning with inductive biases" by Miles Cranmer and Kyle Cranmer, the authors replace the message transfer function learned on many-body dynamical systems with symbolic formulations that allow "learning the equations of physics". Other researchers try to connect GNNs with causal inference, trying to build a graph to explain the causal relationship between different variables. Overall, this is still a research direction in its infancy.
insert image description here

Legend: Different "interpretable" GNN models - graph interpreters, latent graph learning, equivariant message passing.
Most GNN implementations are hardware agnostic. Most GNNs currently rely on GPU implementations, and the default data can fit into memory. However, this is often wishful thinking when dealing with large-scale graphs such as biological and social networks. In this case, it is crucial to understand the limitations of the underlying hardware (such as different bandwidths and latencies of memory hierarchies) and use the hardware conveniently. In general, there can be an order of magnitude difference in the cost of message passing between two nodes in the same physical memory and two nodes on different chips. "Making GNNs friendly to existing hardware" is an important and often overlooked problem. Given the time and effort required to design new chips, and the speed at which machine learning is advancing, developing new graph-centric hardware is an even bigger challenge.

3. A New Blueprint for Graph Learning - "Continuous" Models
"Continuous" learning models are an emerging and promising alternative to discrete GNNs. "Continuous Learning Inspired by Physical Systems" opens up a range of new tools from the fields of differential geometry, algebraic topology, and differential equations that have so far been unexplored in graph machine learning.
Reimagine GNNs as continuous physical processes. Instead of passing multiple layers of messages on a graph, one can consider physical processes that take place on a domain (which can be a continuous domain such as a manifold, and turn it into a discrete graph) in a continuous time dimension. The state of the process at a point in space and time replaces the latent feature of a node in the graph generated by a layer of GNN. The process is controlled by a set of parameters (representing properties of the underlying physical system) that replace the learnable weights of the message passing layer.
A large number of different physical processes can be constructed from classical and quantum systems. The researchers demonstrated in a series of papers that many existing GNNs may be related to the diffusion process, which may be the most natural way of spreading information. There may also be more exotic approaches (such as coupled oscillatory systems) that may have certain advantages.
insert image description here

Legend: Figure dynamics of a coupled oscillatory system.
Continuous systems can be discrete in time and space. Spatial discretization refers to connecting nearby points on a continuous domain in the form of a graph that can vary over time and space. This learning paradigm is distinct from traditional WL tests, which are strictly constrained by the underlying input graph assumptions. More importantly, the idea of ​​spatial discretization inspired the birth of a series of new tools. At least in principle, important problems can be solved that cannot be solved by existing graph theory techniques.
insert image description here

Legend: Different discretization results of the 2D Laplacian.
Learning is an optimal control problem. The space of all possible states of a process at a given time can be viewed as a "hypothetical class" of functions that can be represented. This way of learning can be viewed as an optimal control problem, i.e. whether the process can be controlled (by choosing a trajectory in the parameter space) to achieve some ideal state. Representational capability can be defined as: whether a process can be controlled by choosing an appropriate trajectory in the parameter space to achieve a given function (reachability); efficiency is related to the time required to reach a certain state; generalizability related to the stability of the process.
insert image description here

Legend: Treat learning as a control problem. Using an airplane analogy for a physical system, the xyz coordinates (the state of the system) are controlled by manipulative reasoning, ailerons, and rudders (parameter space).
GNNs can be derived from discrete differential equations. The behavior of physical systems can often be governed by differential equations, the solutions of which yield the state of the system. In some cases, such a solution can be a closed-form solution. But in the more general case, one must rely on a numerical solution based on appropriate discretization. After more than a century of research, a variety of iterative solvers have emerged in the field of numerical analysis, providing a possible new architecture for deep learning on graphs.
Attention mechanisms in GNNs can be interpreted as discrete diffusion partial differential equations with learnable diffusion coefficients, solved using explicit numerical methods. At this point, each iteration of the solver corresponds to a layer of the GNN. There is currently no GNN architecture that can be directly analogized to more complex solvers (eg, using adaptive step sizes or multi-step schemes), and research in this direction may lead to new architectures. The implicit scheme, on the other hand, requires solving a linear system at each iteration, which can be interpreted as a "multi-hop" filter. In addition, the numerical method has stability and convergence guarantees that provide conditions for working as well as explanations for failure cases.
Numerical solvers should be hardware friendly. Iterative solvers are older than digital computers, and since the inception of digital computers, it has been necessary to know to have the underlying hardware to use them efficiently. Large-scale problems in scientific computing often have to be solved on computer clusters, and these problems are critical.
The way in which "continuous" deep learning is performed on graphs enables the discretization of the underlying differential equations in a way that is compatible with analog hardware. There may be a lot of work from the supercomputing research community (such as domain decomposition techniques) that can be used here. Graph reconnection and adaptive iterative solvers take into account memory hierarchies, e.g. performing few information-passing steps on nodes in different physical locations, and more frequent steps on nodes in the same physical memory.
Interpretation of evolution equations as gradient flows of energy functions related to physical systems facilitates understanding of learning models. Many physical systems have an associated energy functional (and sometimes some symmetry or conservation law) in which the differential equation governing the dynamics of the system is a minimized gradient flow. For example, the diffusion equation minimizes the Dirichlet energy, and the non-Euclidean version (the Beltrami flow) minimizes the Polyakov functional, which intuitively understands the learning model. Using the principle of least action, some energy functionals can lead to hyperbolic equations (such as wave equations). The solutions to these equations are fluctuating (oscillating), quite different from typical GNN dynamics.
Analyzing the limit cases of this flow provides a deep understanding of model performance that is difficult to obtain by other methods. For example, in the paper "Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs", Michael et al. demonstrate that traditional GNNs inevitably lead to over-smoothing and are capable of separation only under the assumption of homogeneity; when using Additional structures on the graph can achieve better separation capabilities. In the paper "Graph-Coupled Oscillator Networks", Michael et al. demonstrate that vibrational systems can avoid oversmoothing in the limit. These results can explain why certain undesirable phenomena occur in some GNN architectures, and how to design the architectures to avoid them. Furthermore, linking the limit cases of the flow to the separation reveals the bounds of the expressive power of the model.
A richer structure can be used in the graph. As mentioned earlier, sometimes the geometric properties of a graph can be "insufficient" (incapable of capturing more complex phenomena such as non-pairwise relationships) or "excessive" (i.e. difficult to represent in a homogeneous space). The problem of insufficient geometric properties of the graph can be dealt with by making the graph richer by using additional structures. For example, molecules contain rings, which chemists think of as single entities, not collections of atoms and bonds (nodes and edges).
Michael et al. showed that graphs can be "lifted" into higher-dimensional topologies of "simplicial- and cellular complexes". A more sophisticated message passing mechanism can be designed so that information can be propagated not only between nodes as in GNNs, but also between structures such as rings. Properly structuring such "lifting" operations makes these models more expressive than traditional WL tests.
insert image description here

Legend: The graph is "promoted" to a cellular complex, cellular message passing.
In the paper "Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs", Michael et al. demonstrate that by assigning vector spaces and linear mappings to nodes and edges, it is possible to equip graphs with an additional geometric structure, namely " Cell bundles". Traditional GNNs implicitly assume that the graph has a simple underlying bundle structure, which is reflected in the properties of the associated diffusion equation and the structure of the graph Laplacian. Compared to traditional GNNs, the use of complex "beams" can yield richer diffusion processes, favoring their asymptotic behavior. For example, the diffusion equations, on the selection of an appropriate bundle structure, can be separated in multiple classes of extremes, even in heterophilic environments.
From a geometrical point of view, beam structures are similar to connections, a concept in differential geometry that describes parallel transport of vectors on a manifold. The learning of bundles can be viewed as a method that depends on the geometry of the evolution graph of downstream tasks. Michaedl et al. demonstrated that by restricting the structural group of the bundle (for example, to a special orthonormal group), the node eigenvectors can be rotated only, and some interesting discoveries can be made.
insert image description here

Legend: A cellular bundle built on a graph consists of a vector space and connected linear constraint maps attached to each node. It can be thought of as giving the graph geometric properties, and the constraint mapping is similar to the connection.
The "discrete curvature analogy" is another example of graph geometry, a standard method used by the field of differential geometry to describe the local properties of manifolds. In the paper "Understanding over-squashing and bottlenecks on graphs via curvature", Michael et al. demonstrate that negative graph Ricci curvature can bottleneck the information flow on the graph, leading to over-compression in GNNs. Discrete Ricci curvature can be applied to higher order structures (triangles and rectangles), which is important in many applications. This structure is somewhat "excessive" for traditional graph embeddings, since graphs are heterogeneous (very curved). For the spaces commonly used for embedding, even non-Euclidean spaces are isomorphic (constant curvature).
In the paper "Heterogeneous manifolds for curvature-aware graph embedding", Michael et al. show the construction of a heterogeneous embedding space with controllable Ricci curvature, which can choose a Ricci curvature that matches the curvature of the graph, not only for better Represents neighborhood (distance) structures, and can better represent higher-order structures such as triangles and rectangles. These spaces are constructed as products of isomorphic, rotationally symmetric manifolds, which can be efficiently optimized using standard Riemann gradient descent methods.
insert image description here

Legend: (Left) Spatial forms (sphere, plane, and hyperboloid) with constant positive, zero, and negative Ricci curvatures, below are analogies of plots with corresponding discrete Forman curvatures (clumps, grids, and trees) ). (Middle) Product manifold (a cylinder can be thought of as the product of a circle and a line). (Right) Heterogeneous manifold with variable curvature and an analogy of its graph.
Positional encoding can be seen as part of the domain. Thinking of a graph as a discretization of a continuous manifold, the node position coordinates and feature coordinates can be viewed as different dimensions of the same space. In this case the graph can be used to represent the discrete analogy of the Riemann metric derived from such an embedding, the harmonic energy associated with the embedding is a non-Euclidean extension of the Dirichlet energy, known in string theory as the Polyakov general letter. This gradient flow of energy is a diffusion-type equation that evolves the positional and characteristic coordinates. Building the graph at the positions of the nodes is a form of task-specific graph reconnection that changes in iterative layers of diffusion.
insert image description here

Legend: The result of the evolution of the location and feature components of the Cora graph via Beltrami flow with reconnection.
Domain evolution is an alternative to graph reconnection. As a preprocessing step, diffusion equations can be applied to graph connectivity, aiming to improve information flow and avoid overcompression. Klicpera et al. proposed an algorithm based on Personalized Page Rank, a graph diffusion embedding. In the paper "Understanding over-squashing and bottlenecks on graphs via curvature", this process is analyzed, the pitfalls in heterogeneous settings are pointed out, and an alternative to graph reconnection of the process inspired by Ricci flow is proposed. Such reconnection reduces the effect of graph bottlenecks caused by negative curvature. The Ricci flow is the geometric evolution equation of a manifold, very similar to the diffusion equation used for the Riemann metric, and is a popular and powerful technique in differential geometry (including the well-known proof of the Poincaré conjecture). More broadly, rather than treating graph rewiring as a preprocessing step, consider a coupled system of evolutionary processes: one evolutionary feature, another evolutionary domain.
insert image description here

Legend: (top) The dumbbell-shaped Riemannian manifold with a negative curvature of the bottleneck becomes more rounded and the bottleneck is less pronounced after curvature-based metric evolution. (Bottom) A similar curvature-based graph reconnection process reduces bottlenecks and makes the graph more friendly to message passing.

4 Conclusions
How far the new theoretical framework can go and whether it can solve the currently unsolved problems in this field is still an open question.
Will these methods really be used in practice? A key question for practitioners is whether these approaches lead to new and better architectures, or remain a theoretical tool detached from practical application. Michael Brostein believes that research in this area will be practical and theoretical results obtained through topological and geometric tools will enable better choices for existing GNN architectures. For example, how to constrain message passing functions, and when to use these specific choices.
Has it gone beyond messaging? Broadly speaking, any computation on a digital computer is a form of message passing. In strictly GNNs, message passing is a computational concept that is implemented by sending information from one node to another, which is an inherently discrete process. On the other hand, the described physical model shares information among nodes in a continuous manner (e.g., in a graph-coupled oscillatory system, where the dynamics of a node depend on the dynamics of its neighbors at each point in time). When discretizing and numerically solving the differential equations describing the system, the corresponding iterations are indeed achieved by message passing.
Practical implementations of these physical systems or other computing paradigms (eg, analog electronics or photonics) can be assumed to be used. Mathematically, the solution to the underlying differential equation may sometimes be given in closed form: for example, the solution to the isotropic diffusion equation is a Gaussian kernel convolution. In this case, the influence of neighbors is absorbed into the structure of the core and no actual message passing occurs.
insert image description here

Legend: The application of deep learning based on backpropagation in real physical systems.
Refer to the original link:
https://geometricdeeplearning.com
https://towardsdatascience.com/graph-neural-networks-beyond-weisfeiler-lehman-and-vanilla-message-passing-bc8605fa59a
https://mp.weixin.qq. com/s/_bGQ0PFUYpa_DR12H6YJUw
https://www.jianshu.com/p/615b2649f49b

Guess you like

Origin blog.csdn.net/wujianing_110117/article/details/123391696