Based on analysis of large data bank fraud [Reserved, wind control system can be used to set up reference]

Reprint to Yue Ye https://www.cnblogs.com/yueyebigdata/p/5893454.html Growth Hacker

(Afraid to browser favorites folder will be missing one day, go to his blog, it is recommended to see the original)

0, big data knowledge background.

     The first time I contacted Big Data, the story is "beer and diapers."

Is a marketing case Wal-Mart supermarket. Every weekend, sales of beer and diapers high, after analysis, turned out to be a weekend televised game, the men to watch while drinking, Snub wives had to go out shopping or looking for girlfriends Tucao, child care task naturally return the men. So, the men readily buy diapers at the same time to buy beer. Supermarket beer and diapers put together, naturally improve sales. There are some cases, such as the spread of influenza virus google predictions, such as the Los Angeles Police Department forecast of crime, as well, about the weather forecast for the prospective ticket price fluctuations, which are big data category. Can see from these cases, the core lies the essence of big data, association, predicted value.

      So, the bank's anti-fraud is also a predictable behavior. Now do use big data to predict fraud is also a good idea. Of course, as far as I survey data that many of the companies have already started such a business.

1, machine learning knowledge bank fraud finishing.

    Unsupervised algorithm is mainly performed for outlier pattern mining transactions, various techniques are based on the distance, based on density, based on the depth, based on probability, ......, but the original aim, we need to be determined by computing the distance similarity between the dots in order to determine which points are relatively isolated points. Outliers mining advantage is all the more sensitive to any unusual patterns, the disadvantage is noisy, high rate of false positives, can not determine what types of fraud.

    Supervised classification tree algorithm primarily cart, ann, RBR / CBR technology, the need to extract key features from real cases, training model, and tested. Advantage supervised algorithm is able to model a case, pointing to a clear, high efficiency, the disadvantage is unable to identify an unknown type of fraud, the training did not increase, then hit the new situation can only be thrown off balance.
At present, Alipay risk policy engine doing well, some time ago they had made a boast of their so-called "6 dimensions integrated intelligent judgment" Risk policy engine of the article, it has also passed the very fire in micro-channel circle of friends, which shows the influence force. More leading international is paypal, is said to have begun to have artificial intelligence to determine the risk-based admission policy development in a more complete.
    All algorithms developed from the Main algorithm can be divided into engineers and scientists algorithm, for engineers, algorithms are readily available, the key is how closely integrated with the company's business processes, combined with the better, even the most simple clustering algorithms can produce enormous power, combined with bad, just delve into the inside of the algorithm is actually clever but useless vain; for scientists, need to make full use of the possibility of expanding the boundaries mathematical algorithm efficiency, emphasis on a particular algorithm universal significance the performance, generally not be considered for the particular circumstances of a particular company, such Hinton typical example, he is almost re-invented the ann.
Most of us can only do engineers design algorithms, such as parameter adjustment feature, parallel and serial combination of existing algorithms, data preprocessing, ... and so on, a small number of high talent, and favorable conditions for scientific research and a master of loneliness, scientists algorithm can engage in research and development, but this is a very tedious and risky road.
Last but not least, for the engineers: the anti-fraud algorithms to really play a role in your business intuition is essential, only with good business insight - based only on data simple descriptive statistics can be roughly pre estimate the appropriate detection processes and steps - you can accurately choose the most appropriate model in various types of complex models. Without good business insight and intuition, not on the natural shape of sensitive data, your choice of either algorithm is likely to remain only in a very rough level (both do not feature parameter adjustment, can be done in combination algorithm), or lost in the voluminous literature lengthy algorithms in a loss. In short, business insight and sensitivity of the data is the most important quality data scientists need to go through ten thousand hours of deliberate practice to make unremitting into practice.

     So, I personally feel that no data when using the unsupervised algorithm can better solve this problem.

     

     2, anti-fraud information gathering

    (1) can be said that the current Internet fraud is inseparable from text mining, the most important thing is to identify the semantics. But small is the threshold of a breakthrough, relatively large investment. To achieve a certain degree of accuracy would be extremely difficult lift. Secondly, it is important to image data mining, including image recognition OCR text, images, pictures and more anti-yellow. But the problem with text mining is almost the same, easy to encounter a problem. Machine learning, data mining is a big move, when to inappropriate content in other ways are difficult to identify using machine learning the best way, the essence of machine learning is actually a combination of multi-dimensional multi-rule. The disadvantage is slow effect, large maintenance costs, large collection of samples and so on, but once reached a certain level of machine learning is the best anti-fraud tools. But more than that are the "skill", they are passive to deal with the problem. But the real need is to want to do the anti-fraud "Road." That is a breakthrough product model, build credit system, from the source to the threshold of fraud is higher than the earnings of fraud, will finally put an end to fraud. This is the goal of everyone in the field of anti-fraud efforts.

 

    (2) do the company's anti-fraud and financial model.

     Application fraud: GBG DecTech, meant it credit, with the shield, sesame credit, one hundred financial gold clothes and other anti-fraud services have their own system, including whether to blacklist a general rule, the identity information is consistent, the same equipment, IP, mobile phones and other if a large amount of the purchase price rule.
     Trading Fraud: Falcon, PRM systems, frequent trading rules are usually short-term, large transactions, make cards and other fraudulent.

    

     (3) the depth of mining information of the user.

       1. The front end: the test of information

       Very simple, through the test of identity, real-name authentication to determine whether or not I; fill in personal information is true, my cell phone number, the real common contact information (address can draw your favorite contacts by electricity suppliers and consumer shopping data cross address), etc. Wait   

       2. Credit in: & cross validation information supplement:

       By payment data, consumption data, financial data, social data, mobile terminal data and operational data, the use of advanced big data and machine learning algorithms that can dig deep personal identity of the user basic information, income and expenditure information, hobbies, personal influence force, social relations and form a correlation analysis.

 

       3, Internet-based financial fraud large data ---- neural network

 

      Stir big data from previous years, the current large data mentioned many industry friends will be reported to the expression of disdain, which stems from the data, especially the original mathematical statistics to be excessive rendering, development of the financial industry, but also the nature of information technology development, I have always believed in the power of science and technology, but also believe that technology can continue to improve the financial

Currently more fire mutual gold fields, simply, I believe that the financial nature of the Internet is that the financial, particularly asset-side of P2P sites, their essence remains the microfinance traditional credit risk management is to some extent applicable to mutual Golden. In recent decades with the development of computer technology, along with the constant development of data mining and machine learning, the new anti-fraud and credit scoring technology has been in progress, this article I will briefly explain the current investment in technology and production environment means, can be considered the recent work of learning to do a simple conclusion.

      In fact Microfinance Risk Management, it is essentially active in advance of risk to control, predict and prevent possible risks that may arise. In order to meet the needs of business, we will use a lot of data to establish the appropriate model to measure risk and to avoid late as possible, usually by data mining of personal creditor status, earning capacity, liabilities, modeled a comprehensive measure, and to determine credit the amount of the object and determine a reasonable risk pricing, the risk and profitability reach a balanced state.

Obviously, the general gold mutual credit company to do business (p2p companies are generally unable to obtain similar quality creditor banks) as the credit business development, high-end customers can not get, to face credit groups toward the general population and even penetration of high-risk groups inevitably lead to uneven quality of the customer. Credit risk, the risk of fraud and so rise rapidly

If relying on traditional techniques of credit, in fact, we can only do intensive farming on the stock market, but if we can integrate new data sources (in particular a person's online records, including social, transaction behavior, spending habits, etc.), one can effectively risk reduction, has followed an unexpected effect (beer and diapers) for the extension off of new customers. So on data mining plays an important role in the current era of data explosion, also became a powerful weapon of competition in the industry, an important means of reducing bad debt overdue.

That is simple, but any technological advances, and both are through a time of trial and error to complete. In general, regardless of any kind of fraud, in the final analysis, it is achieved through fraudulent applications. Anti-fraud strategy is to explore the essence of mining and modeling techniques to predict the probability of fraud. Provide a scientific basis for enterprises to discover and reject fraudulent transactions

An excellent scoring model must be based on statistical analysis techniques, can be accurate and real-time risk assessment, enhanced ability to adapt to new patterns of fraud by internal model updates, and analyzes all types of people characteristic patterns of behavior, the use of advanced mathematical statistical techniques, in-depth data mining, constantly revised risk decision-making model, approval, payment management, reminders and other scientific and effective management processes, risk control within reasonable limits.

As far as I know, there is one common rating methodology for banking credit risk is done by scoring method. That is based on long experience in the industry, choose from a number of indicators in the index number of risks, and give appropriate weight each index level weights, each index set specific values. Further the generation of specific data objects into the credit rating system, where each index scoring analysis.

In fact, the design of the index system itself is a very complex system. When scoring, for set each index of major small and score each of the indicators is not very standard basis for defining indicators rely on experience weight mingled human experience, subjective factors are more prominent, such a subjective opinion to determine the weight form accuracy rating methods are problematic in terms of science and objectivity, affecting the rating results under the guidance of subjective factors, increased the risk, it can also cause unnecessary losses

Here to introduce another algorithm based on machine learning and anti-fraud scoring models, both neural network model. Based on neural network scoring model has a special place in the current data mining process, which enables the model to gradually grow constantly learning. This article describes the process will be roughly the neural network data mining method used in micro-credit data, credit risk exploration model is applicable to judge each other's gold.

Neural networks mimic the human brain is a kind of information, similar to the brain of intelligent information processing technology structure process of synaptic connection and mathematical models of information processing, and dynamic game in game theory is very similar, with adaptive , self-organization as well as relatively strong robust, with strong robustness and fault tolerance in the application process, capable of parallel processing, self-learning nature. Have self-organization, adaptability and a strong information integration capabilities, such as good performance, able to handle quantitative and qualitative information, can well coordinate the various input information relationship for complex non-linear and uncertain objects can successfully applied to a variety of different information processing.

The entire figure illustrates a credit scoring model.

A neuron input n can have any

We will be referred to as input parameters: x1, x2, x3, x4, x5, ..., xn

N the same weights may be expressed as: w1, w2, w3, w4, w5 ..., wn

Briefly sum, all of the input value is the excitation and their corresponding weights of the sum of products

Therefore, it can now be written as: y = w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + ... + wnxn

After a series of the neural network inputs and weighted to obtain the output data, i.e., the risk of fraud score.

Network can learn in real datasets environment, improve efficiency through continuous learning process, interactive adjustment of its connection weights, each additional learning process, network environment for real data sets on a better understanding of some of that learning is a process, in this process, the neural network parameters will automatically adjust to changes in their environment

When all of the training set used to estimate the model error is minimum, the model is established, that fits became neural network model, neural network model implied decision classification rule properties. According to the needs of the new property converted into the corresponding data-generation model, it belongs to the classification and the corresponding probabilities can be

Initially the neural network has a more complex structure, long training time, interpretability compare defects and poor, so it is not very promising technology in the classification of data mining, neural network technology but have a low error rate, ability to withstand noise data, and through continuous optimization of neural network training algorithm, especially pruning and constantly improve the lot of network algorithms and rules extraction algorithm, so that the neural network algorithm to data mining application classification technology is increasingly being accepted and recognized.

Of course, credit rating neural network model is an improved method for existing ratings, rather than completely replace the existing rating method, a complete rating system is necessary to avoid the trap of subjective, but also to avoid the pitfalls of statistics, thus neural network the model itself is the hope that through quantitative analysis, provide a basis for the review considered judgment, rather than relying on personal experience

In fact I believe that, due to the black box nature of the neural network, to some extent, at the expense of the model interpretability, than the logistic regression, decision trees, seems not particularly strong explanatory and some other defects.

 

4, advanced design of bank fraud.

Many network from funds stolen account recent events, the vast majority concentrated in the Internet finance company, I more than forty thousand financial funds in the financial APP is also a God of all stolen, distressed thorny (no party involved), and obviously felt Internet security financial products missing, have also asked the public to cash, resulting in a number of Internet financial companies were run at the brink of death.

But the other hand, the banking sector, if they do not participate (do not sign, do not leak password, do not agree to copy the phone card), no one bank account can be stolen, even if internal staff and external collusion bank customer could not move a penny, as the recent 4 billion industry fraud occurred, with internal and external experts caution would still not be intercepted by anti-fraud banks. For your money and my money, to make those good Internet financial companies survive, we have to discuss today how anti-fraud bank is designed.

 

1.jpg

Ten years ago I have been published during that time "hand to get the entire room," "against unarmed drive-level virus" and other safety articles in defense hackers and hacker X-Files, is the PC security the most turbulent days, just a meeting point mouse users can easily download several tools known as hackers do some mischief, and later Dusk Till Zhou Hongyi using rogue guards supplemented by means of the incorporation of the brightest red and black master, only to network security topics gradual return to the flat.

In subsequent years, the network environment to calm people feel secure enough, then the network real-name system, major Web sites have real-name social networking, free outside before the preparation of the brightest black God gradually turn our attention to the major sites, and Tuoku user data (download the user database) obtained into economic benefits, because these data contain a large number of real personal information, which the user can input other information available on the break as the social conditions guess, and not directly into economic benefits for user information will be directly resold brightest telecom fraudsters through the black market, one by one analysis by fraudsters have targeted the development of user information fraud scheme, supplemented by complete supporting facilities, "the official website", "the official 400 telephone", etc., apply a Subtotal such as "you son of a car accident," "Congratulations, you won the lottery," "to my office" and you can win a lot of people, because you know all the other information, including name, address, ID number , where you read the book, where to work, who led that bought something, been there, and who opened Even include this information with your family and friends, when you go to verify the network when the other party is not fraud, Baidu will tell you they are saying is true (pre-paid liar in Baidu to promote fraudulent information, such as telephone companies, etc.).

Action fraudsters implementation of these two purposes, one is to get your money directly, the second is the next best thing to get the saving account, a means of achieving the four categories: transfer requirements, counter signing, access password , phone number copied. Fraudsters using two or more means to achieve the purpose of the process is called social engineering scams, the existence of such fraud is the root cause of the financial insecurity of the Internet, the major banks in the past many years and liars contest has summed up a set identify the real users and real transactions set of programs, which is now the major Internet companies want most, especially the Internet finance companies, because many Internet companies still remain in the low-level financial plan to use external user information to identify a user's identity, even the authenticity of the transaction were not doing any checks, fraudsters eyes closed just squeeze out a company can get good returns, which also led to a flood now fraudsters very reason also live very moist. And all this, not the user's IQ took us by surprise, our system is designed to let the user by surprise. 

2.jpg

先举两个真实案例,第一个就是发生在我身上的,我存在某互联网金融公司某爷的四万多理财资金在一个周五晚上十点的一个小时内全部被盗,我的账户被别人在异地使用新手机登录并修改了登录密码、支付密码、更换了我绑定的银行卡、并额外绑定了三张别人的银行卡,这期间我无法重置支付密码、无法解绑银行卡、无法冻结账户、打客服提示已下班,束手无策,只有绝望。这个过程中发生了多少敏感操作,而我的手机没有收到一条变更确认的短信和变更成功后的通知,只有最后收到一条我的账户被提现到某某卡的通知(完整的详细过程可以翻看我公众号里的那篇《财神爷爷资金被盗是内鬼还是外患》),从这个过程就可以看出这家公司居然没有用户身份真伪识别的机制,更别说交易真实性识别了,完全就是拿着用户的钱在网上裸奔,谁能在旁边说出钱是谁的钱就给谁,作为一家金融公司这样实在是让人震惊。

第二个案例是发生在银行间市场,有个人通过向A银行购买十万理财产品的方式获取了A银行的理财产品说明书、协议书、税务登记证、营业执照、组织机构代码证、客户权益须知等文件,并以个人名义存入2000万以取得A银行贵宾室的使用权,然后冒充A银行工作人员利用A银行的贵宾室,向B银行高息兜售该理财产品,连续多天在A银行的表演和略施小计骗过了B银行的审核人员,从而卖出了一份40亿的理财资金,但是这笔交易被B银行的反欺诈侦测列入了风险监控列表,经过人工审核确认后堵截了这起诈骗事件(详细过程可查看银监会安徽监管局发的2016第55号文件)。对比B银行该案例中表现出来的反欺诈侦测能力,某互联网金融公司的做法就是在作死,互联网金融公司安全能力的提升迫在眉睫也任重道远。

3.jpg

互联网金融公司想要提升自己的安全能力,最好的学习榜样就是银行,而全球范围内率先实现企业级反欺诈管控体系的是美国银行和富国银行,他们在这方面有些非常优秀的设计经验值得学习,现在我们就开始探讨他们在企业级架构下的反欺诈是如何设计的。一般概念下的欺诈分内部欺诈和外部欺诈,它属于风险管控中操作风险管理的一部分。在操作风险管理中除了欺诈外还管理就业制度和工作场所安全事件、客户/产品和业务活动事件、实物资产损坏事件、信息科技系统事件、执行/交割和流程管理事件,今天我们主要探讨欺诈这部分。在外部欺诈中主要有三类欺诈:当事人欺诈、第三方欺诈以及人行要求检查的洗钱欺诈,内部欺诈主要有未经授权的行为与盗窃。对于欺诈的防控分事前防控、事中防控与事后防控,并在以下层面进行防控:

外部渠道层:重点侦测交易发生前的客户接入、会话可疑行为;交易发生中的交易对手是否在可疑欺诈名单。

内部渠道层:重点侦测业务违规与可疑操作。

产品服务层:重点侦测产品服务内的欺诈交易,跨产品的欺诈交易。

数据集成层:重点侦测跨产品、渠道的组合/复杂欺诈交易。

这些不同的层侧重防控的欺诈行为不一样,其侦测逻辑也不一样,渠道层可能侦测以下行为:

异地更换网银盾后首次进行大额转账,这可能是客户的信息已泄露,这种交易需要挂起,并需要打电话给客户进行核实。

客户通过手机或网银渠道向黑名单收款账户转账,被阻断交易后,当天该账户又向其它账户进行大额转账,这可能是客户账户被盗或被电信诈骗分子利用社会工程学的手段实施了诈骗,这种交易需要挂起,并需要打电话给客户进行核实。

异地升级网银盾后首次进行大额转账,这可能是客户身份被盗用,身份证、登录密码等已泄露,这种交易需要挂起,并需要打电话给客户进行核实。

新开通的网银客户进行大额转账,这可能是客户被电信诈骗分子利用社会工程学的手段实施了诈骗,这种交易需要挂起,并需要打电话给客户进行核实。

用户登录所使用的设备指纹(MAC地址、IP、主板序列号、硬盘序列号)、登录时间、设备所在地,与其常用的对应信息不一致,这可能是客户账户已被盗用,这种情况需要进行人工核实。

产品层可能侦测以下行为:

1、 进入黑名单商户的交易,对于已支付未确认付款的交易需要实施冻结,防止资金流入该商户。

2、 根据客户的投诉确认商户是否存在虚假交易,如果是也需要实施冻结。

3、 如果同卡同天当笔交易为上一笔的倍数,这可能是客户账户被盗用,这种交易需要挂起,并人工进行核实。

4、 如果同卡同商户同金额,这可能是商户正在配合客户套现,这种交易需要人工核实。

5、 如果同卡同商户五分钟内交易超限,这可能是在进行虚假交易,这种交易需要人工核实。

6、 如果对公客户的交易额不在其合理的范围内(通过其注册资本、代发代付的累计额等评估的范围),这种交易可能需要拒绝并人工进行调查。

7、 如果使用伪卡进行交易,此后该商户发生的交易可能都需要阻断或告警。

客户层可能侦测以下行为:

1、 特定年龄段客户以往习惯在非柜面进行小额交易,突然第一笔发生大额转账,这可能是账户被盗,需要进行人工调查。

2、 客户账户多日连续多笔密码验证错误,尝试成功后就进行转账操作,这可能是账户被盗,其发起的交易可能需要被阻断,该客户使用的其他产品可能均需要挂起,并进行人工核实处理。

3、 同一个客户的一个或多个产品短时间内在不同地区/国家使用,这可能是客户的卡被复制存在伪卡,这种交易需要人工核实处理。

4、 在一定时间内,同一个客户在特定高风险国家发生多笔或进行大额交易,这可能是伪卡,这种交易需要人工核实处理。

可能需要通过对客户和员工的不同纬度外部欺诈、内部欺诈风险及黑名单信息的分类评估,实现对客户欺诈风险的联合防控,它们之间的风险关系梳理如下:

4.jpg

如果我们要在防控的前、中、后三个阶段都要对各个产品的多个纬度进行统一欺诈防控与处理,那么我们需要基于他们整体建立一套防控体系,通过整理并抽象总结前面提出的侦测行为,我们将它需要实现的目标梳理如下:

1、 应该具有统一的数据集市。

2、 应该具有统一的数据采集、加工过程。

3、 应该具有统一的侦测策略定义过程。

4、 应该具有统一的基于流程引擎的侦测问题流转管理。

5、 应该具有统一的基于流程引擎的案件管理,记录、跟踪、评估、回顾相关的处理过程。

6、 应该具有统一的基于规则引擎的实时、准实时、批量风险侦测。

7、 应该具有统一的信息外送处理。

通过这些目标,我们将它需要具备的功能梳理如下:

1、 反欺诈业务处理:告警管理、案件调查、交易控制、侦测处理。

2、 反欺诈运营管理:运营管控、流程管理、策略管理。

3、 反欺诈数据报表:数据整合、数据报告。

4、 反欺诈模型研究:规划研究、变量加工、贴源数据。

5、 反欺诈行为分析:行为分析、关联分析、评级计算、批量处理。

基于前面的要求,我们来梳理一下与反欺诈有关的上下文关系,如下图:

5.jpg

图中蓝色线是交易访问关系,橙色线是批量数据访问关系,通过这些关系,我们再来细化梳理一下它们在应用架构中的位置:

7.jpg

再把它们在数据架构中的位置也梳理出来:

8.jpg

现在,我们可以梳理一下反欺诈的具体处理流程了。渠道层的处理流程梳理如下:

9.jpg

产品层的处理流程梳理如下:

10.jpg

客户层的处理流程梳理如下:

11.jpg

在这些处理流程中,对于需要加强认证的行为,需要将该次交易列入风险监控列表中,经事后人工确认确实存在欺诈行为的,将此类行为列入风险行为模型中,完成欺诈侦测随着欺诈行为的变异而不断进化。

好了,到这里我们反欺诈设计的主体部分就算设计完成了,这是在企业级架构中逻辑各层已解耦的前提下进行的设计,分阶段分层各司其职分而治之,通过建立行为模型灵活应对用户的各种行为,适应现在与未来,对于那些新出现的欺诈手段,主动学习并生成欺诈行为模型,将可有效杜绝现在与未来可能发生的欺诈。

通过反欺诈设计的这个过程,我们可以总结几招识别一家互联网金融公司是否具备反欺诈能力的小技巧:

1、 将您的帐户在其它手机上登陆,测试渠道层反欺诈能力;

2、 将您的帐户在异地登陆,测试渠道层反欺诈能力;

3、 修改您的登陆密码,测试产品层反欺诈能力;

4、 修改您的支付密码,测试产品层反欺诈能力:

5, modify identity information, test client layer anti-fraud capacity;

6, bind the new bank card, test product layer anti-fraud capacity;

7, with a new card withdrawal, transaction testing anti-fraud capacity;

8, withdrawals phone with others, the ability to test anti-fraud transaction;

9, remote full withdrawal, transaction testing anti-fraud capacity;

Carried out more than any previous operation, if there receive SMS alerts, indicating the account abnormal behavior recognition mechanism; if you have received a text message verification code, indicating the account behavior control mechanisms; If you receive a call to confirm, stating the authenticity of the user identity recognition. If only SMS alerts, please use caution, if not immediately withdraw cash immediately and unloading.

Guess you like

Origin blog.csdn.net/u010720408/article/details/92795343