Frequent product shelf under the UGC, the platform side Gairuhezou content audit dilemma

APP as it involves a national grass content violation, disappeared from the Android application market, it should be regarded as the hottest topic recently.

C2C platform as the grass, the APP not only allows users to share online and grass lists can also publish links to buy, while for consumers to accept Amway can buy goods directly. Together, they became perfect diversion artifact.

In fact, the platform itself is just a simple content sharing community, anyone can publish anything since, like the semi-immortal Buddha mentioned in the same article, C2C together for the UGC platform, as long as the place is capable of publishing content, the content out of control risks are there. Comments district, Nickname, private letters region, the forum ..... are likely to be criminals use as its publishing prohibited goods, ××× drainage, ××× fraud and other leading platform for content control.
Pictured: illegal information

content runaway problem is actually part of the anti-fraud risk control, so this article will set out from two aspects, content to talk about corporate governance status quo: one black and gray capacity class how to use UGC platform publish illegal information, and drainage fraud; Second, enterprises in the face of such problems, what solutions will be taken.

First, why release a large number of black ash production of illicit information, enterprise is difficult to identify?

Internet black ash production, whether it is deeds hurried wool party, or traces forum navy, with an account of each platform is their first step into the "workplace", the bulk registration on a large number of accounts become all the malicious behavior source. Currently, this black industry chain has been formed on the middle and lower reaches of a clear division of labor. First of all, in the middle of the chamber of commerce number to purchase a number of phone numbers from the upper reaches of the card business there, and then using the access code platform, cat pool, a code platform, other automation tools, batch registration of false accounts; then will sell the account to the downstream use to pull out of wool, the amount of brush, spread the contents of contraband and other illegal acts profits.


Cats pool .png

Pictures to the network: cat pool

Coding platform .png

图为:接码/打码平台

内容平台散布违禁内容的源头正是这样一批账号。许多平台都会通过设置安全防护策略,或实名认证的方式在注册/登录环节对它们进行源头的过滤。但平台风控和黑产突破始终是一个持续博弈的过程,当黑产发现,利用新注册的账号发布广告等违规内容很快就会被平台发现并封号之后,他们会先让账号过上一段“正常人”的日子,在平台监控真的认为他们的“正常人”后,再撕下伪装的面具,在平台作恶,这种行为就是“养号”。

“养号党”利用群控技术+自动化脚本批量操控多个账号在平台发帖、留言或者点赞,这一切都是为了模拟正常个人用户的行为,逃过平台的风控规则。

Group Control site .png

图片来自网络:群控现场

根据威胁猎人的持续监控发现,刚注册的新账号根据不同平台可获利益不同,售价会在0.1-10元不等,在“养了”半年或者更长时间之后,价格就会翻上一番,达到1-100元或者更高的价格。如果账号“拥有了自己的姓名”,价格还能够进一步提升,只要从“料商”处买来×××等实名信息绑定账号就能实现,此时的账号售价可以直接翻倍,达到10-200元甚至更多(比如微信账号)。


Account the sale of the situation .png

图为:威胁猎人反欺诈情报监测平台监测到的账号售卖情况

被“养过”的账号随后被买到下游黑产手中,用于广告营销、传播违规信息等途径,于是就有了内容平台上出现的黑账号。它们因为模拟成正常账号逃过了平台的防护规则。

之后,通过不断转变敏感词汇的表现形式(如,微信可以变成威信、VX等),黑产就能绕过平台对敏感词汇的过滤规则,再利用自动化脚本就能批量发布违规内容。

Content Audit 3.jpg

图为:违规信息

不过,其实现在黑产的日子也不那么好过了,这和平台采取的打击措施有关。

许多内容平台,在受到黑产的侵害后,会马上加大对内容的审核力度,通过风控策略拦截恶意账号,或者组建审核团队、搭建数据模型来打击平台作弊行为。

二、防范违规内容产生的基础措施

监管部门加大监管力度,内容平台提高自我审核力度已是目前的常态。强监管下,企业或强化自身审核能力,或与第三方合作探索提升内容安全的方式,但其底层逻辑都比较固定的,就是风控加持审核体系。

1、利用反欺诈风控体系,从源头拦截黑账号

前面说到的机器审核和人工审核机制,都是在违规内容出现之后进行内容管理的一种方式。

但在与黑产***对抗的过程中,我们发现,如果能在黑账号进入平台时、发布违规内容前就进行拦截,就能解决很大一部分违规问题。

在平台上发布违规内容的账号,通常是黑产在发卡平台先买入一批手机号,然后利用自动化工具批量注册的。如果平台能够通过账号识别体系,在黑产注册或登录当下就识别并拦截,就能阻止进一步的作恶。

我们在建立反欺诈体系建立过程中,通过对黑产业链核心节点的布控,能够掌握大量的黑产所利用的虚假手机号和恶意IP资源。当黑产利用这些资源进行账号的注册时,就可以在注册或登录环节进行风险账号的拦截或降权处置。

这一层防护体系,就能从源头上解决违规内容的产生。对于漏网之鱼,只要补充内容审核机制,大多数平台的内容安全问题基本都能解决。


2、补充内容审核机制,过滤违规内容

  • 利用机器审核过滤违规内容


目前,违规信息主要会以四种主要的形式出现在平台上,那就是文本、图片、视频和音频。

文本内容相对来说会比其他三种内容的处理成本低,平台可以自己维护一套动态文本库,持续收集和更新敏感违规词汇,也可以联系安全服务商接入一套外部文本库。其中的逻辑都是一样的,就是通过文本词库过滤的形式,筛选出违规的文字内容,然后进行后续的人工处理或直接删除等操作。

而对于除文本外的图片、视频和音频的视频,往往和人工智能、机器学习技术挂钩。要搭建一整套智能审核体系,需要比较高的成本。因此,对于这类内容,平台会选择接入安全服务商的服务。


Content Audit 1.png
图为:机器审核涉及的智能技术


以图片审核为例。在图片上会出现两种违规形式:

  • 一种是图片本身的内容是违规属性的,比如,图片中出现有×××、武器、暴力场景的画面;

  • 另一种是人为后期给图片加入了违规信息,比如,图片上添加了推广二维码、微信号码、电话号码等内容


针对这两类图片,利用第三方的智能识别技术就能进行识别。而服务商只要不断利用收集到的图片数据信息对模型进行组合训练,保证模型的快速迭代,就能进一步提升识别的准确率。

  • 人工辅助审核,降低误判率


充分利用人工智能技术对违规内容进行识别,在很大的程度上,其实就是为了节省人力。但实际上,不管行业把人工智能技术吹得多牛X,就现阶段来说,这项智能技术都会有不够智能的时候,还是少不了人工的辅助。


Content Audit .png

Pictured: + machine manual review of the audit review mechanism complementary

Take the "scan code" that word, it can be turned into various forms of Rolling Ji Shima, a small horse, sweeping ** codes, artificial intelligence and can not precise identification of all variants. In addition, the filtering rules set for sensitive content can not be too strict, otherwise, the situation is normal manslaughter content will appear.

In such a case, the manual review is an essential link.

Different platforms can use the machine audit mechanism set different audit policy, then be sentenced for sexual abuse of varying degrees of marked tab, and then make the appropriate treatment.

For example, to determine the offending content, can be marked as [high-risk], delete or title treatment; for the alleged breach of the content, can be marked as [in danger], and then give the text a warning; to hit a small amount of illegal content, but can not accurately determine the nature can be marked as [low risk], the content of low risk can be processed by manual review, to avoid manslaughter.

Manual review of all aspects vital to do all the contents of the platform. Currently, the relatively large content platform, and even set up its own human trial mechanism, at least a dozen people, as many thousands of people, a special audit platform content.

Written in the last


In the specific practice, different platforms have different content presentation form, the final solution and will set the rules for "platform" varies a. But no matter what the platform, you need to be aware of is that black *** means of production is in constant evolution, in order to avoid interception strategy, will continue to the emergence of new forms of content distribution, the platform side only continue to focus on content management solutions, business security side only constant iteration protection technology, in order to gain the upper hand in this "cat and mouse" in.

Guess you like

Origin blog.51cto.com/13642687/2426614