Application mapping knowledge in large data

With the development of mobile Internet, Internet of all things become possible, data which are generated by the Internet explosive growth, and these data can be just as effective analysis of raw materials relationships. If the previous intelligence analysis to focus on each individual, in addition to the mobile Internet era of the individual, the relationship between such individuals are bound to become a very important part of our need for in-depth analysis. In one task, as long as there is demand analysis of relationships, knowledge maps on the "likely" come in handy.

Speaking of the importance of the relationship, let's look at an interesting theory, Six Degrees of Separation (English: Six Degrees of Separation), I believe we have heard, this theory holds that any two strangers in the world, only a few intermediaries will be able to establish contact.

Harvard University psychology professor Stanley Milgram in 1967 conducted an experiment based on the concept of chain letters, trying to prove that an average of only 6 steps you can link any two strangers Americans. This phenomenon does not mean that any link between the people must go through six steps will be reached, but expressed such an important concept: between any two strangers, through certain Information, always able to generate necessary connection or relationship. Mapping knowledge for us to open up a whole new way of thinking to understand things.

What is knowledge map

Knowledge maps in one sentence, that is to form the map to store and represent knowledge. Mapping knowledge is essentially a semantic network, based on a data structure diagram of a node (Point) and the edge (Edge) composition. In the knowledge map, each node represents a real-world presence "entities", each side in the "relationship" between entities and entity.

Knowledge Mapping is the most effective representation of the relationship, it depicts the reality of our world in the form of graphs. Now I shows the relationship between the famous Japanese animation director Hayao Miyazaki and his work and his work with the actors and the knowledge map. As can be seen from the figure, Miyazaki has directed many films including "My Neighbor Totoro", including, but movie "My Neighbor Totoro," the actor has Takagi equal. In this way hundreds of thousands of directors, actors and film together, they form a film of a knowledge map.

The FIG is shown Neo4j

The origin of the knowledge map

Knowledge Mapping is a new concept put forward in 2012 by the Google company. Mapping knowledge, i.e. a special kind of semantic network, which uses entities, relationships, attributes, these basic units, as symbols described relationship between the physical world and the concept of different concepts.

Knowledge Mapping aims to establish a link between the data associated with the organic tissue fragments of data together, make the data more easily understood and processed by humans and machines, and for the search, excavation and analysis to facilitate, for the artificial intelligence implementation provides knowledge base.

In order to improve the quality of Google search engine returns the answer, we introduced the concept of knowledge map. Knowledge map assistance, the search engine to query semantic information behind the user, returns more accurate information is more structured. Google mapping knowledge slogan "things not strings" revealed the essence of Knowledge Mapping: Do not meaningless string, you need an object or thing behind the text.

With Ronaldo, for example, when the user searches to "Ronaldo" as a keyword, without knowledge map, we can only get pages that contain the keyword, and then had to click into the website to find relevant information required. With the knowledge map, search engine return pages at the same time, will return to the "knowledge card" a query object contains basic information, if the information we need in the card, do not need a further operation. In other words, knowledge maps can improve query efficiency, let us get more accurate, more structured information.

Of course, this is only part of the knowledge map scenarios in the search engines. For this example to demonstrate how such a concept mapping knowledge or technology, which is in line with the birth of computer science, the development trend of the Internet.

Knowledge map storage

Knowledge Mapping There are two main ways to store: one is based on RDF storage; the other is based on stored map database. The difference between them as shown in FIG. An important design principle RDF is easy to publish and share data, map database put the focus on the efficient and drawing on search queries. Second, RDF triples way to store data and does not contain attribute information, but the map database generally attribute picture shows the basic representation, the entity may contain attributes and relationships, which means easier to express the reality of the business scene .

根据最新的统计(2018年上半年),图数据库仍然是增长最快的存储系统。相反,关系型数据库的增长基本保持在一个稳定的水平。同时,我们也列出了常用的图数据库系统以及他们最新使用情况的排名。 其中Neo4j系统目前仍是使用率最高的图数据库,它拥有活跃的社区,而且系统本身的查询效率高,但唯一的不足就是不支持准分布式。相反,OrientDB和JanusGraph(原Titan)支持分布式,但这些系统相对较新,社区不如Neo4j活跃,这也就意味着使用过程当中不可避免地会遇到一些刺手的问题。如果选择使用RDF的存储系统,Jena或许一个比较不错的选择。

知识图谱的应用

从一开始的Google搜索,到现在的聊天机器人、大数据风控、证券投资、智能医疗、自适应教育、推荐系统,无一不跟知识图谱相关,它在技术领域的热度也在逐年上升。下面我们简单介绍下几个典型的应用。

反欺诈

知识图谱在反欺诈作用非常大,反欺诈最终目的是识别坏人,把坏人跟其他的未知人群的关系找出来,从而认定其他未知人群是否是坏人,这个跟信用模型是很不一样的,如果原来只能看一层的关系,现在可以看两层三层四层,效果就完全不一样了,很多团伙、中介实际上是要看很大规模的一张网,看很多层关系,关系之间还有强关系、弱关系。

下图是我们将知识图谱应用于反欺诈中的示例图:

目前将用户信息,设备信息及社交关系构建了一个异构网络,并将该异构网络图应用在用户关联分析及反欺诈检测场景。根据数据图我们可以对用户做以下调查分析,来确定特定的用户是不是欺诈用户或者是不是与欺诈用户有关联:

  • 通过特定规则筛选可疑用户
  • 查看与可疑用户有特定关联的用户
  • 查看与可疑用户有特定关联的所有用户组成的子网的网络特征及用户特征
  • 分析特定用户可以通过什么样的关联关系关联在一起
  • 可分析多层关联关系的数据

通过该方式,我们大大减少了调查过程中的工作量,整体提升效率。

智能搜索

智能搜索的功能类似于知识图谱在Google, Baidu上的应用。也就是说,对于每一个搜索的关键词,我们可以通过知识图谱来返回更丰富,更全面的信息。

推荐引擎

通过知识图谱,查询某节点的消费情况可为其推荐关联度高的可能消费的商品。

精准营销

一个聪明的企业可以比它的竞争对手以更为有效的方式去挖掘其潜在的客户。在互联网时代,营销手段多种多样,但不管有多少种方式,都离不开一个核心——分析用户和理解用户。知识图谱可以结合多种数据源去分析实体之间的关系,从而对用户的行为有更好的理解。比如一个公司的市场经理用知识图谱来分析用户之间的关系,去发现一个组织的共同喜好,从而可以有针对性的对某一类人群制定营销策略。

总结

本文主要介绍了下知识图谱相关概念和在大数据分析中的一些应用。知识图谱为互联网上大数据表达、组织、管理以及利用提供了一种更为有效的方式,使得网络的智能化水平更高,更加接近于人类的认知思维,塑造出了反欺诈、智能营销、商品推荐等应用场景,给我们提供了更多思考和分析问题的方法。

Guess you like

Origin www.cnblogs.com/xiaodf/p/11262621.html