Application practice of e-commerce social data in big data risk control

Abstract: With the deepening of inclusive finance business and the fierce competition in consumer finance business, the risk control for credit swindlers is particularly important. How to carry out a fast and effective credit rating for credit white households, Wolong Big Data will share with you some of the application value of e-commerce and social data in risk control based on its own practical experience.


0?wx_fmt=jpeg


With the deepening of inclusive finance business and the fierce competition in consumer finance business, the risk control for credit white households is particularly important. How to carry out a fast and effective credit rating for credit white households, Wolong Big Data will share with you some of the application value of e-commerce and social data in risk control based on its own practical experience.


1. Data coverage of e-commerce social data:


Wolong has conducted data matching tests with many different types of financial institutions. The following figure shows the overall matching of Internet behavior data of various financial institutions.

0?wx_fmt=jpeg
It can be seen that

traditional rural commercial banks mainly face offline people, and the matching rate of online data is very low. It is basically unfeasible to use e-commerce social data for credit evaluation, but it is not feasible to use big data to attract customers. A direction worthy of attention;

for large joint-stock banks and consumer finance companies, especially online loan platforms, the data matching rate can reach 50% or more, which has great potential for big data risk control analysis.



2. Anti-fraud application


of

e-
commerce , On a certain treasure platform, from luxury villas to iron nails and toothpicks, especially various offline services, except for drug addiction, there are almost nothing that cannot be sold. It is this feature that gives us a lot of room for analysis.

The following is a batch of typical cases we obtained:

0?wx_fmt=jpeg


0?wx_fmt=jpeg


0?wx_fmt=jpeg


According to our tracking of Internet behavior characteristics of a batch of users, we found some very interesting characteristics. The modeling and analysis process is shown in the figure:

0?wx_fmt=jpeg


For a batch of keywords found in it, we perform term weight analysis, and the clustering is shown in the figure below:

0?wx_fmt=jpeg


After nearly 100,000 overdue and Based on the analysis of millions of Internet behavior records of fraudulent users, it can be divided into three customer groups according to different keywords:

1. Lai Lai customer group: typical ones such as old Lai who cause headaches for banks, users with asset disputes will be related to legal disputes, etc. Keywords;

2. Long-term lending: These users will be associated with keywords such as Xinkouzi, cash, Jingdong Baitiao, Ant Huabei, Suning Finance, etc., and use various new platforms to break down the east wall to make up for the west wall through the method of scouring the wool;

3 , Black production intermediary: These users will be associated with keywords such as weekly cards, spare parts and equipment numbers. Judging from the tracking of black production intermediaries, the current black production has formed an extremely hidden and highly automated industrial chain.

    Using this batch of keywords, combined with business knowledge and machine learning algorithm mining, we found thousands of abnormal keywords, hundreds of thousands of black products, and related products to millions of abnormal users. At the same time, we found that a treasure is also trying to suppress abnormal products produced by black products. Some of the products we analyzed will disappear from time to time on a treasure, so this batch of abnormal data is basically unique to Wolong. Through analysis, it is found that many of these data are not in the traditional long-term lending and online loan blacklist databases, and can be used as a supplement to the blacklist database. At the same time, they have also received good feedback from the tests of several cooperative companies.

Social data anti-fraud                               
Social data is another interesting topic. In addition to users who directly pay attention to loans and black-related topics, we have found a group of users who are swiping orders and posts through social analysis tools such as graph databases and PageRank algorithms.

The specific process is as follows:

0?wx_fmt=jpeg


The most interesting thing is the overlap of numbers. The currently published black list has a great overlap with the users on the gray list of social swiping and posting. The conclusion is: make the best use of everything! The popularity of the real-name system has brought about scarcity of number resources, and maximizing value utilization is the main feature of the black production platform, which also provides clues for our anti-fraud based on big data.

3. Risk control modeling application of e-commerce social data
Credit evaluation has always been the top priority in the financial field. When introducing the application of Wolong e-commerce and social data in the field of credit evaluation, first popularize some basic knowledge.

Model evaluation dimension


0?wx_fmt=png


Model features


0?wx_fmt=jpeg


Examples of business empirical methods include analysis based on characteristics such as the proportion of branded products, the proportion of active comments, and the distribution of user shopping categories. Generally, the wider the distribution of shopping categories, the stronger the online consumption of this user, and the lower the possibility of brushing the user.

A classic case of machine learning is to use Pagerank to calculate the influence of Weibo users. Generally, the larger the pagerank value, the higher the influence, and the less likely the user will lose their trust. In addition, for example, the label diffusion method is used to calculate the corresponding user graylist probability weight feature through the blacklist library. These characteristic IV values ​​(that is, Information Value, information value) are generally above 0.1. The figure below shows the relationship between PageRank segmentation value and default rate in large loan and small loan.

0?wx_fmt=jpeg


PageRank score is not the higher the lower the risk, but also needs to be distinguished according to their loan products, large loans (more than 50,000), the higher the score, the greater the possibility of overdue default; small loans (50,000) 10,000 and below) is just the opposite.

Through business experience and machine learning methods, and considering shopping categories, we have constructed more than 30,000 indicators in total. The following figure shows the general process of our selection of indicators:

0?wx_fmt=jpeg


The figure below shows the vacancy rate performance of the feature in the sample. It can be seen that a large part of the features are missing, which is a major feature of Internet data and the biggest challenge at present. We filter out some particularly sparse features through a certain threshold.

0?wx_fmt=png


The following figure shows the distribution of the 50 characteristic IV values ​​we selected. Compared with bank credit cards and other features, it will be weaker (we test the features based on bank credit card flow structure, usually the IV value can reach about 0.4), but it is also a rare good feature variable.

0?wx_fmt=jpeg


has a higher IV value of category features such as digital accessories, mobile phone accessories, snacks, and underwear for men and women. This kind of category that is not exposed to the outside world can well distinguish a person's consumption level.

Model Algorithms
Traditional scorecards generally use logistic regression, because such models are highly interpretable and facilitate communication and supervision by higher authorities. However, we use a decision tree model with a general interpretability, but with stronger performance and better effect.

Model architecture diagram 

0?wx_fmt=jpeg


KS value

0?wx_fmt=jpeg


Through e-commerce and social data, the KS value of the model reaches 0.28, plus the basic user information and asset information authorization information in the traditional loan application form, the final The modeling KS effect reaches 0.36.



Four experience summary:
1. E-commerce social data is suitable for groups with active online behavior, especially for users with scenarios, such as 3C digital, medical beauty, education and other consumption staging fields. For traditional offline people with sparse online behaviors, it is basically unfeasible to use e-commerce social data for credit evaluation.

2. The more features the better, the more low-value features will reduce the overall effect of the model. And the more features, the more difficult it is to analyze the interpretability of the model, so the screening of high-quality features is essential.

3. The effect of e-commerce and social data for credit evaluation modeling is good, but it cannot achieve the effect of direct use. Therefore, e-commerce and social data need to be used in conjunction with other data to maximize their value.

4. The application of e-commerce and social data in the field of anti-fraud is more direct than credit assessment. The abnormal shopping records and sensitive behaviors identified by Wolong are 4.7 times higher than normal customers.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326357239&siteId=291194637