Application and practice of relationship graph in the risk control system of shell house finding

image


Sharing guest: Wang Xuezhi, senior engineer of Shell

Article finishing: Li Guangming

Content source: Shell House Finding Knowledge Graph Technology Conference

Production platform: DataFun

Warm reminder: During the epidemic, wash your hands frequently and wear a mask.


Introduction: imageRisk control is a technical field that is strongly related to specific businesses. Shell’s ACN (Agent Cooperation Network) business model aims to break the linkage problems between listings, users, and brokers across brands and establish cooperation. A win-win benign ecology achieves more efficient resource allocation. Under this business model, an alliance of interests and trust between brokers, housing, and customer sources has been gradually established, as well as a dignity alliance and a quality alliance for brokers. And network alliance. Focusing on Shell’s ACN business model, we will introduce Shell’s risk control system from the following three aspects:

  • Shell Wind Control's business overview
  • Application and Practice of Relationship Graph in Shell Wind Control
  • Relationship map planning

▌The business overview of Shell Wind Control

Firstly, we will introduce Shell's risk control business risks, characteristics of risk control, and corresponding risk control system .

1.  Shell’s business risks


image

  • The “false prosperity” brought about by false listings /customer sources/Taking Kan has limited short-term impact, but in the long run, it will damage the credibility of the platform and seriously affect the user experience, which will lead to the loss of customer sources and listings and reduce the attractiveness of the platform. force ;
  • Resource leakage caused by crawlers, as well as broker’s illegal in vitro behavior (bypassing the platform to promote order), all have the risk of harming the platform’s revenue;
  • Malicious competition among brokers, such as Tibetan houses, Tibetan customers, and low rates, violates Shell’s platform rules and brings risks to the construction of the platform’s healthy ecology.

2.  Shell’simage risk control characteristics The industry’s risk control is mostly reflected in online scenarios, while Shell’s diversified business scenarios imageinvolve multiple links online + offline, which brings additional difficulties to risk control. Various small B (business)/big B violations mixed in the long chain of the entire link have exacerbated the franchisee risk and broker risk of the platform. This requires Shell’s risk control system to not only identify risks, but also At the same time, it is necessary to provide an explainable and complete chain of evidence. In particular, shell's unique low-frequency, large-amount, and long-period transaction attributes are different from the high-frequency, small-amount, and short-period scenarios that are common in the industry , and are the focus and difficulty of the risk control system.

3.  Shell's risk control system

Combining business scenarios and characteristics of risk control, Shell has constructed a hierarchical risk control system as shown in the following figure: imageThe bottom layer is the data capability layer, which provides full data including broker data, real estate /real estate data, store data, and broker behavior data. Business-related data; the upper layer is the core technology layer, which uses the relationship graph to mine risk relationships, and provides broker risk labels, store risk labels, and city risk compasses; the final capability output layer can be divided into three levels: pre-event, during-event, and post-event Links: The pre-event link mainly plays a role in the store/broker access stage to ensure the authenticity and effectiveness of the "I am my" broker/store; the mid-event link is mainly used to control the online real-time behavior of the broker for risk characterization , At the same time, it will build a risk compass for brokers to facilitate understanding of changes in the risk situation; and afterwards, you can find problem stores/brokers through reporting, system proactive identification, etc., and provide a complete chain of evidence through the relationship.

▌Application and practice of relationship graph in shell wind control

1.  Why use associations


image人力介入程度从高到低的风控手段依次包括黑名单机制,专家规则,有监督的模型,关联图谱以及行为序列分析五个层次,其中黑名单机制,专家规则都需要较高的人力成本,而且从行业现状来看,很难构建一份完整的业界黑名单,从而导致黑名单机制的可实施性较差,而专家规则过度依赖专家经验,存在较高的经验误判风险,主观犯错概率较大,因此通过黑名单或专家规则的方式,很难构建切实可行的风控体系,而监督模型方法的难点在于样本的定义与构造,因此关联图谱和行为序列分析这种自动化识别风险的方法吸引了越来越多的关注。image贝壳之所以使用关联图谱的出发点还是出于贝壳业务中常见的小B风险和大B风险的权衡考虑。B风险具有隐蔽性差,团伙规模大,单量高以及查处成本低等特点,在危害程度上高于小B风险,因此需要重点控制大B风险。另外,从针对大B的违规行为中分析发现,与关联关系强相关的违规行为占比达到29.3%,因此关联关系在贝壳风控体系中的作用,总体上可以表述为先通过关联关系或举报发现高风险的违规行为,而后可利用关联关系识别中低风险的违规行为,进而实现对“黑”,“灰”,“白”三个不同层次违规行为的监控和打击。

2. 贝壳的关系图谱image贝壳风控的关系图谱经历了事实图谱 -> 推理图谱 -> 图谱融合三个阶段的演进。事实图谱涵盖了贝壳找房所有线上的动作和行为,形成了10亿级节点,100亿级边的巨大图谱;推理图谱构建在事实图谱基础上,利用不同节点间的关系,分别构建行为图谱,社交图谱,作业图谱和工商图谱,这也是贝壳风控目前所处的阶段;而未来的图谱融合阶段,会利用ID打通,完成多个推理图谱的关系融合,进而完成人与人之间亲密度的定性或定量的表达,从而实现深层次的风险控制。image贝壳关系图谱的整体架构主要包括四层:基础数据层负责收集各种来源的行为和属性数据;知识构建层通过多种手段抽取实体和关系,完成构图;知识挖掘层会结合传统的最短路径/关键路径方法,以及社区发现,标签传播和Graph Embedding等机器学习方法,进一步挖掘节点之间的关系;业务应用层基于关联图谱提供了溯源分析,风险量化,违规行为主动发现等业务能力。image贝壳的关联图谱技术选择了Spark GraphX作为图分析的工具,Janus Graph作为图查询的工具,之所以选择GraphX主要是考虑Spark拥有较完备的社区支持,并且GraphX实现了Gradle在Spark上的重写,具有较高的效率;而选用Janus Graph主要是出于业务场景中较多的图的可视化以及溯源分析的需求考虑。

3. 贝壳关系图谱的应用image从前文的贝壳关系图谱的整体架构可以看出,在应用层,关系图谱提供了准入防控,风险量化,品质管理,风险发现,查案溯源等能力。image准入防控,主要利用path searching,risk path ranking技术,解决风控关系传播中长路径下的风险识别能力,可以进行针对门店的体外公司搜索(包括多种变种),负面信息评估(路径上是否存在失信人员,被投诉人员等),人员历史风险,以及针对经纪人的信息交叉验证,关联关系扫描以及黑灰白的分级。image风险主动发现,主要有三种手段:① 基于举报获得风险种子人群,利用关系图谱监控相关人的行为,发现“白”到“灰”、“黑”的变化;② 基于用户行为以及行业经验的模式提取与搜索;③ 社区发现的Louvain算法以及Graph Embedding的自动化机器学习挖掘风险方法(下面有进一步介绍)。image查案溯源,根据种子风险节点的多维属性,探索多条关系路径,找到相关人,进而判断亲密度,发现潜在的风险团伙。

4. 社区发现和自动化机器学习imageLouvain是一种经典的基于图的社区发现算法,其优化目标为尽可能提升图的模块度(衡量社区紧密度的标准),模块度提升的定义如下:image其中前面一项为节点加入邻居社区后,社区内的所有边,后一项代表节点加入邻居社区后,社区所有边(包括内部边,以及连接外部的边),目的是希望变化后的子社区内边多,外边少,即社区更聚集。Louvain是一种迭代式算法,每一轮迭代可以分成两个步骤:① 算法扫描图中的所有节点,针对每个节点遍历该节点的所有邻居节点,衡量把该节点加入其邻居节点所在的社区所带来的模块度的提升,并选择对应最大收益的邻居节点,加入其所在的社区,这一过程化重复进行直到每一个节点的社区归属都不再发生变化;② 对 ① 中形成的社区进行折叠,把每个社区折叠成一个单点,此时新生成的节点之间的边的权重为两个结点内所有原始节点的边权重之和。重复以上两步,多轮迭代直至算法收敛,则可以发现图上的多个社区。Graph Embedding是图表达的一种方法,其大概思想是将图数据转化为序列数据,进而利用word2vec等处理序列数据的方法得到图中结点的向量化表示,而这样的转化通常可以通过图上的随机游走来完成,Node2Vec就是结合了深度优先和广度优先的图游走方法。image如上图所示,当从结点t走到节点v时,面临的选择可以分为三种:① 返回节点t;② 走到与节点t具有一度关系的节点x1③ 与节点t有二度关系的节点x2Or x 3 . Use p and q to control the relationship between path return and continued exploration. If you are inclined to explore in depth, you can set q to be smaller and p to be larger.

▌Relationship graph planning imageThe mining of high-density subgraphs is an upgrade of community discovery. It can better discover groups, combine relationship fusion, and Graph Embedding algorithms to jointly improve the basic capabilities of relation graphs. Business applications based on this can be extended to scenarios such as risk management and user growth.

That's it for this sharing, thank you all.

imageSharing guests▬Wangimage
Xuezhi Shell | Senior Engineer

——END——


Guess you like

Origin blog.51cto.com/15060460/2675346