Sister Nan's Technology Talk: Those Things about Graph Computing | JD Cloud Technical Team

I don't know if everyone is in their usual work

Have you ever heard of the term " graph computing "?

But everyone must have heard the words "intelligence" and "artificial intelligence" in various work reports and technology sharing.

 

And the graph calculation we are going to talk about today

It is the hot frontier darling in the field of artificial intelligence in recent years

It is also the "big killer" commonly used in our risk control and anti-fraud

 

Before understanding graph computing

First of all, we must understand what is a " graph "

 

what we say today

In fact, it is a data structure used to represent the relationship between objects

Has strong abstraction and flexibility

It has strong expressive ability in terms of structure and semantics

It is precisely because of the rich expressiveness of the graph structure

There are many examples in real life that can be represented as "graphs"

Such as social networks , road networks , financial transactions , etc.

R & D or algorithm-related small partners know

Our commonly used machine learning and deep learning algorithms

Most of them are used to process some regular, orderly, or structured data

Such as matrix, picture, text, sequence, etc.

And the processed data are assumed to be independent and identically distributed

However, the nodes on the graph are naturally connected

This means that the nodes are not independent

At this point, the graph computing we are going to mention today comes

Its core is precisely to model data as a graph structure

And solve how to convert the problem solution into a computational problem on the graph structure

When the algorithm task involves association analysis between multiple individuals

Graph computing often enables problems to be expressed naturally as a series of operations and calculations on the graph structure

 

However, graph computing needs to solve various problems

It is difficult to solve all problems with one set of computing models

Next, we will come to the system site

Those things about graph computing

---  ---

For example, by virtue of whether the edge has direction

Graphs can be divided into directed graphs and undirected graphs

Whether the edge has weight

Graphs can also be divided into weighted graphs and unweighted graphs.

Whether the vertices and edges in the graph have multiple types

Graphs can be divided into isomorphic graphs and heterogeneous graphs

Also, whether the graph structure and graph information change over time

Graphs can be divided into static graphs and dynamic graphs

" Degree " and " Neighborhood "

are two important concepts involving graph nodes

The "degree" of a node refers to the number of nodes connected to it

If it is a directed graph, it will also distinguish between " in-degree " and " out-degree "

A node's "neighbors" are other nodes connected to it

About the representation of graphs

There are still a few basic concepts that have to be mentioned

One is the " adjacency matrix "

Used to quantitatively represent the edge relationship between nodes

There are also " node features " and " edge features "

Unique numerical properties used to characterize nodes and edges

No matter how complex the graph algorithm model

are based on these basic concepts

Ask one of the most basic questions about graphs - the problem of node representation

It is how to base on the information and attributes of the above graph

Quantified representation of nodes or edges in a graph

In CV and NLP tasks

We will design CNN and RNN modules

To model the information represented by image pixels and text characters

The same idea is also used in graph representation learning

With a reasonable node vector representation

We can then explore various downstream tasks

For example, to classify nodes

Find those nodes that have special behavior or properties

or community division

Find out the set of nodes with the strongest aggregation and the highest similarity

In addition, various downstream tasks such as link prediction and subgraph partitioning can also be performed.

What do you want to do with graph computing

totally depends on your actual needs

see here

congratulations

You have already started graph calculation

---   ---

Graph computing is not a new algorithm

If we trace its history

Euler is considered one of the greatest mathematicians in human history

For his description of the problem with the Seven Arch Bridge in Königsberg

The discipline of graph theory then emerged

In a park in Königsberg

There are seven bridges connecting the two islands in the Pregel River to the banks of the river

Euler studied and proved this problem in 1736

He attributed the problem to the "one stroke" problem

And prove that the one-stroke move is impossible

during his research

Abstract the land and bridge in the problem into points and edges respectively

and form a simple topological graph

introduces basic concepts about graphs

After that, there was an early application of graph theory - area rendering (coloring)

With the advent of the age of great navigation from the 15th to the 17th centuries

and the rise of the nation-state concept after the French Revolution

Countries around the world are starting to create higher-resolution maps

And how to use the least number of colors in the drawing to ensure that two adjacent areas (country, state, province)

The problem of distinguishing with different colors is a classic graph theory problem

In the middle of the 19th century, mathematicians proved the problem of "five-color map" by hand calculation

And until a full century later, in 1976,

It was only with the help of computer computing power that the feasibility of the "four-color map" was initially proved.

And after optimizing through graph calculation

Replaced the five-color map with a four-color map

The above map coloring problem is a typical NP-complete problem in mathematics

such as navigation, resource scheduling, search and recommendation engines

However, the big data framework and solutions corresponding to these scenarios

In the beginning

There is no real use of native graph storage and computing modes

In other words, people are still using columnar databases

or even document databases to solve graph theory problems

Inefficient and low-dimensional tools are used to forcefully solve complex and high-dimensional problems

Then its user experience may be poor or the input-output ratio is extremely bad

In recent years, with the development of the Internet

Knowledge graph gradually penetrates into the hearts of the people

The development of graph computing and graph databases has only begun to receive renewed attention

In the past half a century, many graph computing algorithms have come out

Including the well-known Dijkstra algorithm that appeared in 1956

Research solves the shortest path problem for graphs

Various more complex community discovery algorithms have emerged as the times require

Used to detect associations between communities, customer groups, and suspects

It is to represent each vertex in the graph as a low-dimensional vector

And make the vector able to save as much structure and content information of the graph as possible

and can be used as features for subsequent learning tasks

Such as node classification, link prediction, etc.

These works are aimed at different types of data such as isomorphic graphs, heterogeneous graphs, attribute graphs, and dynamic graphs.

Various proposals have been proposed

Including classic algorithms  DeepWalk , LINE , Node2Vec

The basic idea of ​​these algorithms is to generate data based on random walks

Then optimize the parameters by training

generate probabilistic models

Extend classic neural network models such as RNN, CNN, etc. to graph data

Unlike graph representation learning, which tries to learn the vector of each point

The purpose of the graph neural network is actually to learn the aggregation function

All points can use local information to calculate their own representation through the same function

Even if the graph structure changes, or even a completely new graph

can also use the original function to calculate meaningful results

Regarding the graph neural network, a series of classic algorithms have also been born

--- ★★★ ---

Finally, let’s talk about the practical application of graph computing

At present, many large Internet companies and financial technology companies

In fact, it is inseparable from graph computing technology

 

PageRank invented by Google founder Larry Page at the end of the 20th century

This is a large-scale page, link ranking algorithm

It can be said that the core technology of early Google is a shallow concurrent graph computing technology

There is also Facebook, the core of its technical framework is its Social Graph

That is, friends associate friends and associate friends

As a result, Facebook has established a strong social network

Facebook open source a lot of things

But this core graph computing engine and architecture has never been open source

If you can recall the world financial crisis that broke out in 2007-2008

Lehman Brothers goes bankrupt

Goldman Sachs was able to get out

The real reason behind it is the application of a powerful graph database system - SecDB

And for all the technology-driven new Internet companies

Such as Paypal, eBay and many of our domestic financial and e-commerce companies

Graph computing is not uncommon

Graph's core competencies can help them reveal the interrelationships of data

the last ten years

With the widespread application of artificial intelligence technology represented by deep learning

Graph learning has gradually become a hot topic

Breakthroughs have also been made in causality, explainability

Now, graph learning has also been extended further

Such as advertising , financial risk control , intelligent transportation , medical care , smart city and other fields

Finally, let’s talk about some examples of graph computing applications in financial anti-fraud

In fields involving money transactions such as finance and e-commerce

There is always no shortage of black products active in it for illegal profit

Such as plucking wool, swiping orders, cashing out, false transactions, etc.

Compared with the occasional arbitrage behavior of individual users themselves

Those black production gangs that gather and operate crimes in various business scenarios with gangs as units

Their actions will cause greater and more serious economic losses to the platform

And graph computing is a good recipe for identifying gang cases

By using thousands of users, merchants, equipment, network environments, etc. as nodes

Use information such as usage and transactions as associated information to establish edges

A heterogeneous map with a very wide coverage can be formed

Combined with different application backgrounds to determine the recognition target

And select graph computing models, samples, labels, etc.

One graph training with supervised learning

Finally, in the inference stage, the probability output of the risk level of the nodes or edges in the graph is output

Then some friends will say

There are too few high-quality sample labels for risk control scenarios

Not a big problem, there are also many graphical models that can be used for unsupervised learning

For example, community discovery does not require any tag information

The most closely related node set can be clustered

in our experience

It is one of the best graph algorithms to identify gangsters

Another example is the popular self-supervised learning and comparative learning in recent years.

Applied to the field of graph computing, unsupervised pre-training can be performed on the graph

Starting from the nature of graph structure and graph attributes

Learn vectors with good representational power for graph nodes

Can be used in various downstream risk control intelligent models

--- ★★★★ ---

With the recent explosion of AIGC large models out of the circle

Artificial intelligence has ushered in a new wave

Compared to generative language and vision models

Graph computing is indeed a bit colder

But Sister Nan believes that a good meal is not afraid of being late

The days without radiance are all preparations for radiance

Maybe one day, graph computing will also usher in its own hot search

 

----Written at the end----

  The conception, creativity, overall structure, and later modification of the pictures in this article, all copyrights belong to Jingdong Nanjie, and the materials are generated from Midjourney and Nanjie's original prompt words. It is not easy for Sister Nan to produce pictures, and they are not perfect. Please do not use them for other occasions and purposes without permission

  The pictures in this article are only for illustration and illustration, and contain some exaggerated and humorous elements, please do not take the same place~ Any similarity is purely coincidental~ No offense is intended~

  The text of this article is compiled based on the following references:

[1]. Ma Yao, Tang Jiliang. Graph Deep Learning [M]. Electronic Industry Press.

[2]. Zhang Changshui, Tang Jie, Qiu Xipeng[M]. Introduction to Graph Neural Networks[M]. People's Posts and Telecommunications Press.

[3]. Zhihu. A Brief History of Graph Computing Development [EB/OL].
https://zhuanlan.zhihu.com/p/562893366

[4]. Baidu. The foundation of big data - the development of graph computing [EB/OL].
https://baijiahao.baidu.com/s?id=1743913772591545506&wfr=spider&for=p

Author: Jingdong Technology Ding Nan

Content source: JD Cloud developer community (please do not reproduce without authorization)

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/8795975