Sister Nan's technical talk: those things about graph computing

⭐️halo, hello everyone, first time meeting ~ I am Sister Nan ~‍‍

⭐️Inspired by some excellent public accounts recently, I hope to try to use some new methods to write a new technology sharing. In a relaxed and humorous tone, I will carry out some technical science popularization and sharing on my official account~‍‍‍‍‍‍‍‍‍‍

⭐️Originally, the article was posted on the internal technical forum of the factory, but after being encouraged by many colleagues, I tried to post it in my circle of friends, and explored some things that I had never done before~

⭐️This article can be regarded as the beginning and test the water. If the effect is good, Sister Nan will work hard to continue the series of technical science articles, and if it is a public account, you can also explore some new topics besides technology, and strive to make a A series of articles with personal characteristics and original IP~ I hope everyone will join us~

--- ---

I don't know if everyone is in the usual brick-moving career

Have you ever heard of the term " graph computing "?

But everyone must have heard of it in work reports and technical science popularization

Words like "intelligence", "artificial intelligence"

And the graph calculation we are going to talk about today

It is the hot frontier darling in the field of artificial intelligence in recent years

It is also a commonly used "big killer" in the field of risk control and anti-fraud

Before understanding graph computing

First of all, we must understand what is a " graph "

what we say today

In fact, it is a data structure used to represent the relationship between objects

Has strong abstraction and flexibility

It has strong expressive ability in terms of structure and semantics

It is precisely because of the rich expressiveness of the graph structure

There are many examples in real life that can be represented as "graphs"

Such as social networks , road networks , financial transactions , etc.

Friends who are engaged in research and development or algorithm related know that

Our commonly used machine learning and deep learning algorithms

Most of them are used to process some regular, orderly, or structured data

Such as matrix, picture, text, sequence, etc.

And the processed data are assumed to be independent and identically distributed

However, the nodes on the graph are naturally connected

This means that the nodes are not independent

At this point, the graph computing we are going to mention today comes

Its core is precisely to model data as a graph structure

And solve how to convert the problem solution into a computational problem on the graph structure

When the algorithm task involves association analysis between multiple individuals

Graph computing often enables problems to be solved naturally

Expressed as a series of operations and calculations on the graph structure

However, graph computing needs to solve various problems

It is difficult to solve all problems with one set of computing models

Next, we will come to the system site

Those things about graph computing

--- ★ ---

For example, by virtue of whether the edge has direction

Graphs can be divided into directed graphs and undirected graphs

Whether the edge has weight

Graphs can also be divided into weighted graphs and unweighted graphs.

Whether the vertices and edges in the graph have multiple types

Graphs can be divided into isomorphic graphs and heterogeneous graphs

Also, whether the graph structure and graph information change over time

Graphs can be divided into static graphs and dynamic graphs

" Degree " and " Neighborhood "

are two important concepts involving graph nodes

The "degree" of a node refers to the number of nodes connected to it

If it is a directed graph, it will also distinguish between "in-degree" and "out-degree"

A node's "neighbors" are other nodes connected to it

About the representation of graphs

There are still a few basic concepts that have to be mentioned

One is the " adjacency matrix "

Used to quantitatively represent the edge relationship between nodes

There are also " node features " and " edge features "

Unique numerical properties used to characterize nodes and edges

No matter how complex the graph algorithm model

are based on these basic concepts

Ask one of the most basic questions about graphs - the problem of node representation

It is how to base on the information and attributes of the above graph

Quantified representation of nodes or edges in a graph

In CV and NLP tasks

We will design CNN and RNN modules

To model the information represented by image pixels and text characters

The same idea is also used in graph representation learning

With a reasonable node vector representation

We can then explore various downstream tasks

For example, to classify nodes

Find those nodes that have special behavior or properties

or community division

Find out the set of nodes with the strongest aggregation and the highest similarity

In addition, various downstream tasks such as link prediction and subgraph partitioning can also be performed.

What do you want to do with graph computing

totally depends on your actual needs

see here

congratulations

You have already started graph calculation

--- ★ ★ ---

Graph computing is not a new algorithm

If we trace its history

Euler is considered one of the greatest mathematicians in human history

For his description of the problem with the Seven Arch Bridge in Königsberg

The discipline of graph theory then emerged

In a park in Königsberg

There are seven bridges connecting the two islands in the Pregel River to the banks of the river

Euler studied and proved this problem in 1736

He attributed the problem to the "one stroke" problem

And prove that the one-stroke move is impossible

during his research

Abstract the land and bridge in the problem into points and edges respectively

and form a simple topological graph

introduces basic concepts about graphs

After that, an early application of graph theory appeared

- area rendering ( coloring )

With the advent of the age of great navigation from the 15th to the 17th centuries

and the rise of the nation-state concept after the French Revolution

Countries around the world are starting to create higher-resolution maps

And how to use the fewest colors in the drawing

To ensure that two adjacent areas (country, state, province)

distinguish with different colors

This problem is a classic graph theory problem

mid 19th century

Mathematicians prove 'five-color map' problem by calculating by hand

And until a full century later, in 1976,

It was only with the help of computer computing power that the feasibility of the "four-color map" was initially proved.

And after optimizing through graph calculation

Replaced the five-color map with a four-color map

The above map coloring problem is a typical NP-complete problem in mathematics

such as navigation, resource scheduling, search and recommendation engines

However, the big data framework and solutions corresponding to these scenarios

In the beginning

There is no real use of native graph storage and computing modes

In other words, people are still using columnar databases

or even document databases to solve graph theory problems

Inefficient and low-dimensional tools are used to forcefully solve complex and high-dimensional problems

Then its user experience may be poor or the input-output ratio is extremely bad

In recent years, with the development of the Internet

Knowledge graph gradually penetrates into the hearts of the people

The development of graph computing and graph databases has only begun to receive renewed attention

In the past half a century, many graph computing algorithms have come out

Including the well-known Dijkstra algorithm that appeared in 1956

Research solves the shortest path problem for graphs

Various more complex community discovery algorithms have emerged as the times require

Used to detect associations between communities, customer groups, and suspects

It is to represent each vertex in the graph as a low-dimensional vector

And make the vector able to save as much structure and content information of the graph as possible

and can be used as features for subsequent learning tasks

Such as node classification, link prediction, etc.

These works are aimed at different types of data such as isomorphic graphs, heterogeneous graphs, attribute graphs, and dynamic graphs.

Various proposals have been proposed

Including classic algorithms DeepWalk , LINE , Node2Vec

The basic idea of these algorithms is to generate data based on random walks

Then optimize the parameters by training

generate probabilistic models

Extend classic neural network models such as RNN, CNN, etc. to graph data

Unlike graph representation learning, which tries to learn the vector of each point

The purpose of the graph neural network is actually to learn the aggregation function

All points can use local information to calculate their own representation through the same function

Even if the graph structure changes, or even a completely new graph

can also use the original function to calculate meaningful results

Regarding the graph neural network, a series of classic algorithms have also been born

--- ★★★ ---

Finally, let’s talk about the practical application of graph computing

At present, many large Internet companies and financial technology companies

In fact, it is inseparable from graph computing technology

PageRank invented by Google founder Larry Page at the end of the 20th century

This is a large-scale page, link ranking algorithm

It can be said that the core technology of early Google is a shallow concurrent graph computing technology

There is also Facebook, the core of its technical framework is its Social Graph

That is, friends associate friends and associate friends

As a result, Facebook has established a strong social network

Facebook open source a lot of things

But this core graph computing engine and architecture has never been open source

If you can recall the world financial crisis that broke out in 2007-2008

Lehman Brothers goes bankrupt

Goldman Sachs was able to get out

The real reason behind it is the application of a powerful graph database system - SecDB

‍‍‍‍

And for all the technology-driven new Internet companies

Such as Paypal, eBay and many of our domestic financial and e-commerce companies

Graph computing is not uncommon

Graph's core competencies can help them reveal the interrelationships of data

the last ten years

With the widespread application of artificial intelligence technology represented by deep learning

Graph learning has gradually become a hot topic

Breakthroughs have also been made in causality, explainability

Now, graph learning has also been extended further

Such as advertising , financial risk control , intelligent transportation , medical care , smart city and other fields

Finally, let’s talk about some examples of graph computing applications in financial anti-fraud

In fields involving money transactions such as finance and e-commerce

There is always no shortage of black products active in it for illegal profit

Such as plucking wool , swiping orders , cashing out , false transactions , etc.

Compared with the occasional arbitrage behavior of individual users themselves

Those black production gangs that gather and operate crimes in various business scenarios with gangs as units

Their actions will cause greater and more serious economic losses to the platform

And graph computing is a good recipe for identifying gang cases

By using thousands of accounts, merchants, equipment, network environment, etc. as nodes

Link registration, transaction and other key information as related information

A heterogeneous map with a very wide coverage can be formed

Combined with different application backgrounds to determine the recognition target

And select graph computing models, samples, labels, etc.

One graph training with supervised learning

Finally, in the inference stage, the probability output of the risk level of the nodes or edges in the graph is output

Then some friends will say

There are too few high-quality sample labels for risk control scenarios

Not a big problem, there are also many graphical models that can be used for unsupervised learning

For example , community discovery does not require any tag information

The most closely related node set can be clustered

in our experience

It is one of the best graph algorithms to identify gangsters

Another example is the popular self-supervised learning and comparative learning in recent years.

Applied to the field of graph computing, unsupervised pre-training can be performed on the graph

Starting from the nature of graph structure and graph attributes

Learn vectors with good representational power for graph nodes

Can be used in various downstream risk control intelligent models

--- ★★★★ ---

With the recent explosion of AIGC large models out of the circle

Artificial intelligence has ushered in a new wave

Compared to generative language and vision models

Graph computing is indeed a bit colder

But Sister Nan believes that a good meal is not afraid of being late

The days without radiance are all preparations for radiance

Maybe one day, graph computing will also usher in its own hot search

---- Written at the end ----

★ It took more than 40 days before and after this article, and it was finally completed, which can be regarded as the realization of Sister Nan's idea a few months ago. Due to the huge amount of the project, I thought about it many times during the process, but fortunately, with the support of Xiao Jiang, I persisted and completed the first chapter, which was not perfect. If there is time later, Sister Nan will continue to write, and strive to make a series of articles with a very personal style. Sister Nan has been engaged in risk control work for three years and two months. This article is also intended to pay tribute to her three years of algorithmic time, pay tribute to the thigh mentor Chen who brought her into the industry, and pay tribute to the lovely risk control colleagues who have been working together~‍‍‍

★ This article is just the first article to test the waters, and there may be imperfections or impreciseness, please understand, and thank you very much for your patience to see the end~

★ The pictures in this article are only for illustration and illustration, and contain some exaggerated and humorous elements, please do not take the same place~ Any similarity is purely coincidental~ No offense is intended~

★ It is not easy for Sister Nan to produce pictures, and they are not perfect. Please do not use the original pictures in this article for other occasions and purposes without permission.‍‍‍‍

★ The text of this article is compiled based on the following references:

[1]. Ma Yao, Tang Jiliang. Graph Deep Learning [M]. Electronic Industry Press.

[2]. Zhang Changshui, Tang Jie, Qiu Xipeng[M]. Introduction to Graph Neural Networks[M]. People's Posts and Telecommunications Press.

[3]. Zhihu. A Brief History of Graph Computing Development [EB/OL]. https://zhuanlan.zhihu.com/p/562893366

[4]. Baidu. The foundation of big data - the development of graph computing [EB/OL]. https://baijiahao.baidu.com/s?id=1743913772591545506&wfr=spider&for=pc

This article is shared from the WeChat public account - JD Cloud Developers (JDT_Developers).
If there is any infringement, please contact [email protected] to delete it.
This article participates in the " OSC Source Creation Program ". You are welcome to join in and share it.

Sister Nan's technical talk: those things about graph computing

Guess you like