Graph practice of Nebula Graph in ZhongAn Insurance

This article was first published on the Nebula Graph Community public account

Nebula Graph in ZhongAn Insurance Practice

Internet finance lending is different from traditional credit business. Compared with traditional credit business, Internet finance has the characteristics of fast response, large data scale and high risk. The main business of ZhongAn Insurance is to provide credit guarantee insurance. In order to serve the business, the big data team has built a risk control system to deal with decision-making problems of Internet lending. This article mainly describes how Nebula Graph is selected by Zhong An Insurance, and how Nebula Graph is applied to specific business scenarios to help Zhong An Insurance solve risk control problems.

business background

Different from the ten-day or half-month application review time of traditional bank credit business, the first feature of Internet financial lending is that the application review is very fast. It may be that the user has just submitted a credit application on the mobile phone in the last second, and the next second system The result of the credit application will be returned. In addition, Internet financial lending has another feature: the authenticity of data information is difficult to guarantee, and the information filled in by users: annual income, family relationships, and contacts will all have false information. And these two characteristics of Internet finance have given birth to an industry, that is, Internet black production. In layman's terms, online black production is the behavior of users to "borrow" wool. Because of the strong concealment of online lending, it is difficult to trace a specific person through the Internet once a fraudulent account is committed. In addition, due to the timeliness of loan approval, black-produced accounts can easily get more money. Based on this, the demand for risk control in Internet finance needs to be systematically screened for fraud scenarios.

So how to identify online black products? Through the association between users and different entities, devices, GPS and mobile phone numbers, as well as community discovery, checking whether individuals in the community are at risk of fraud, and conducting case investigations on anti-fraud, lending risk control can be well carried out. At present, ZhongAn Insurance's risk control is based on Nebula Graph.

Why choose Nebula Graph

At the beginning of ZhongAn Insurance's technology selection, team members researched products in the graph database market, and first screened out JanusGraph and OrientDB.

Let’s talk about JanusGraph first. JanusGraph has a big advantage within the technical team of Zhongan Financial Business: team members are familiar with it, and many engineers have used JanusGraph, which reduces the cost of graph database development and start-up to some extent. Anyone who has used JanusGraph knows that it is a distributed graph database, and storage and indexing depend on open source components, such as HBase (storage) and Elasticsearch (index). A certain business line of the company has used JanusGraph before, and the bottom layer is equipped with online HBase storage service, and this business is relatively independent and does not have strong dependencies with other core businesses. "Different countries have different national conditions. Once the same mechanism is relocated to different countries, there may be problems of acclimatization." At present, the basic data of ZhongAn Insurance's risk control business is stored in HBase. If the risk control system uses JanusGraph, Fully importing tens of billions of graph data into HBase will affect the HBase cluster, increase query glitches, and affect other business lines. Also, JanusGraph imports are slower in terms of large-scale write speed performance. Based on the above reasons, even though JanusGraph has a low start-up cost, it strongly depends on other components and has poor import performance, so JanusGraph passes.

During the research of graph database products, we found that OrientDB ranks high in DB-Engine and has perfect functions. After performance testing, it is found that using OrientDB in small-scale data sets feels good, but once the Mock data exceeds 100 million, using OrientDB in large-scale data sets will encounter frequent errors on the server side. After checking the official documents of OrientDB to no avail, ZhongAn Insurance submitted an issue to the official GitHub repository of OrientDB. However, OrientDB's feedback response is slow. During the process of submitting the issue, we also found that the large-scale data set server side frequently reported errors. The community user submitted it two years ago, and the issue is still open. In addition, in terms of large-scale data writing performance, the speed of the write point is acceptable, but the QPS of the write edge is only 1-2k. If you start graph data modeling with this speed, it will take days. This is impossible. accepted. To sum up, although OrientDB ranks high and has perfect functions, we did not choose OrientDB in the end due to frequent server error reports for large-scale data, slow response to community issues, and poor large-scale writing speed.

The opportunity for Nebula Graph to participate in the technology selection is that when ZhongAn Insurance started the selection of graph databases, they unanimously recommended Nebula Graph when consulting the graph databases used by practitioners of other companies (JD.com, Ctrip…). Therefore, Nebula Graph has become one of the options for ZhongAn Insurance's graph database selection. In the actual test, we found that the large-scale writing speed of Nebula Graph is very fast, and the test data in the production environment can reach 10w+ QPS. In addition, Nebula Graph storage and indexing relies on the local RocksDB library and does not depend on other big data components, which is in line with business needs. In terms of big data ecological support, Nebula Graph supports mainstream Spark ( nebula-spark-connector ) and Flink ( nebula-flink-connector ). Nebula Graph also gave us a better experience in terms of community response and feedback timeliness.

Here is an additional talk about community support. During the entire graph database research process, we found that compared with mature SQL databases such as MySQL and Oracle, the graph database has a shorter development time. The resulting problem is that some graph databases are encountered. Product questions, search engines can provide less information. Like the frequent error reporting problem of OrientDB before, if the community fails to provide timely technical feedback, it may take a lot of time for users to read the source code to debug, the labor cost will rise sharply, and the cost performance will be extremely low.

Nebula Graph has given ZhongAn Insurance a very good experience in terms of community support and feedback. As their customers, including in the earliest 1.0, ZhongAn Insurance submitted a lot of usage problems and bugs to Nebula Graph, and Nebula R&D students can reply and fix bugs in a timely manner. When 2.0 is deployed, they can also provide technical support in a timely manner when we encounter production deployment problems. Compared with other graph database vendors, this is highly recommended. This is also the fundamental reason why we choose Nebula Graph as a graph database to support ZhongAn insurance business.

Financial Risk Control Business Practice

The following figure shows the architecture of ZhongAn Insurance's risk control system based on Nebula Graph, which integrates data processing, processing and cleaning, computing, and graph service applications.

As shown in the figure above, the bottom layer is the business library. Different business relationship data exists in different business libraries, including user accessories, equipment, GPS, IP and other information.

The upper layer is a graph database processing and cleaning layer composed of offline data warehouses and real-time data warehouses. The offline data warehouses perform daily T+1 data reflow through DataX, and the data returned to the business database is stored in ODPS, and Nebula Graph is read through Spark. Take the data and write it to the database. In terms of real-time data warehouse, the data is written to Kafka through the monitoring component BLCS within ZhongAn Insurance, and then the data is cleaned and processed through the real-time data warehouse built by FlinkSQL, and finally written to Nebula Graph in real time through Flink. In order to ensure data consistency, the real-time data warehouse performs data verification every day. If there is inconsistency in the data, offline data will be used to fill in the missing data.

Above the data cleaning and processing layer is the storage & computing layer. Needless to say, the storage layer is naturally Nebula Graph. In terms of computing, through the Spark Connector component provided by Nebula Graph, the data in the graph database is read into the Spark platform to execute the prediction model through GraphX, and finally the results are written back to Nebula Graph.

Finally, through the microservice system of ZhongAn Insurance, the graph database storage & calculation is connected to the upper-layer graph application, and graph services such as graph exploration service, risk control feature, case investigation, and prediction model are provided.

Relationship graph

Here is a brief explanation of the relationship map of ZhongAn Insurance’s internal graph community exploration, and the relationship map in the above figure explains how ZhongAn uses the graph database to identify fraud scenarios and how to use the graph database to practice risk control features.

There are 2 types of nodes in the above figure:

  • people (blue nodes)
  • Mobile phone (green node)

There are 3 types of relationships:

  • people-[application]->mobile
  • Phone-[Contacts]->People
  • People-[Bind Card]->Mobile

At first glance at the above picture, it is obvious that there are 2 dense hotspots, and the mobile phone number of the hotspot is filled in by fifty or sixty as the mobile phone number of his family contact. According to common sense, in most single-child families in contemporary China, with collateral relationships, it is difficult for 50 or 60 people to fill in the same mobile phone number as the mobile phone number of their family contact at the same time. Therefore, the person associated with this mobile phone number may be a member of a fraudulent gang, and the black production gang may know that one part of the loan scoring system is to score the mobile phone number of the family contact. The gang hopes to improve the credit score by linking mobile phone numbers with high credit scores. .

Based on the above characteristics, we can query the size of the user's community and whether the user is in a suspected fraudulent community to make a preliminary risk control judgment. As described here, even if a user is in an abnormal relationship network, it does not mean that he is a fraudulent user. Being in an abnormal community is a sufficient and unnecessary condition to judge whether a user is a fraudster. Because there is a possibility that the user itself is not a fraudster, but the immediate relatives are involved in intermediary agency and gang fraud, then there will be a situation where normal users and abnormal users have the same relationship network.

Next, we need to dig deeper into the dispersion of "intimacy" between users and anomaly centers, and explore their path distances. By combining the path finding function of Nebula Graph itself, it analyzes the degree of dispersion (close to an abnormal point, or at the edge of the community) to determine whether a user is suspected of fraud.

Here is an example of a mobile phone number to help you understand how ZhongAn uses Nebula to identify users' fraudulent scenarios. In fact, ZhongAn Insurance also has a relationship map of equipment, IP, etc., which will not be repeated here.

Graph Model Prediction

This part introduces the prediction model of the following figure,

  • Connected Component(贷前)
  • Label Propagation
  • Degree Statistical

The relationship graph introduced in the above section is calculated by the Connected Component algorithm, and is mainly used in the user credit application process before lending.

Then there is Label Propagation. Unlike the Unicom component, Label Propagation is more used in the loan process. Label propagation mainly propagates through a certain point Y and derives its related points. For example, a user in the loan user list is a severely overdue person, and this person is a certain point Y marked with an overdue label. Combined with the established risk control rules, check which points have similar overdue behaviors among the points extended by point Y. In order to judge whether these points belong to the seriously overdue community. This is the label propagation algorithm in Lending.

The last algorithm is Degree Statistical, the degree of relationship in the whole graph, which is mainly used by risk control personnel. When making risk control features, risk control personnel may propose dozens or hundreds of graph features. Based on these feature data, historical data needs to be used to verify which features can truly identify fraudulent users or severely overdue users. And this verification process, if the traditional data warehouse is used for in-depth query through ODPS, it is a very inefficient process in terms of execution efficiency, time-consuming, and SQL code writing. However, reading the point data into GraphX ​​through Nebula Graph to calculate the relationship degree of the whole graph, and writing the 7-degree or 10-degree relationship back to ODPS in the form of a row, providing it to risk control personnel, can help them faster Complete the formulation of risk control rules and complete risk control tasks.

future outlook

Version planning

When sharing the topic, the Nebula version used by ZhongAn Insurance is 2.0.1, and the watermark function will be added in the subsequent Nebula v2.5.0 to prevent the query from encountering dense hotspots that take up too much memory and drag down the storage process. ZhongAn Insurance will deploy the v2.5.0 version in the test environment for verification. After the verification is passed, the business line will be gradually switched to the v2.5.0 version.

More application scenarios

In the future, Nebula may be applied in the management of the relationship between tables and fields in data warehouses, and the management of the task relationship of the scheduling platform. The students in the basic platform department of ZhongAn Insurance are starting to use Nebula Graph to replace the existing traditional implementation solutions.

This article is organized from Zhongan Insurance's map practice theme sharing, you can watch the video for more details: https://www.bilibili.com/video/BV1dS4y1K7BH?spm_id_from=333.999.0.0


Exchange graph database technology? To join the Nebula exchange group, please fill in your Nebula business card first , and the Nebula assistant will pull you into the group~~

Pay attention to the public account

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4169309/blog/5504801