Knowledge graph is hot enough, but the underlying technology link is still poor | AI Technology Ecology

Don't directly use the automated machine learning method. The purely "replace intelligence with computing power" method is not effective and wastes computing resources. ——Zhang Jie, Director of Knowledge Engineering Laboratory of Minglue Academy of Science and Technology

Reporter | Xi Yan

Interview Guest | Zhang Jie, Director of Knowledge Engineering Laboratory of Minglue Institute of Science and Technology

Selling | CSDN (ID: CSDNnews)

The "AI Technology Ecology" character interview column is an important part of the initiative of learning AI by millions of people initiated by CSDN. Through interviews with top AI eco-powers, entrepreneurs, and industry KOLs, it reflects their thinking about the industry, judgment of future trends, technical practice, and growth experience. In 2020, CSDN will conduct interviews with 1000+ people to form a series, so as to outline the most influential people in the AI ​​ecosystem and the AI ​​industry panorama!

This article is the thirteenth issue of the "AI Technology Ecology" series of interviews. Through the unicorn technology of the knowledge graph field, we can gain insight into the knowledge graph technology and industrial ecology.

Millions of people learn AI, and you have a share! Participate in the article review, and leave a comment in the comment area to receive a ticket for the live broadcast of the "2020 AI Developer Ten Thousand Conference" worth 299 yuan.

In recent years, the concept of knowledge graph has been in flames. In essence, this is a large-scale semantic network used to describe the conceptual entity events of the objective world and their relationships. Taking entity concepts as nodes and relationships as edges, it provides a way to see the world from the perspective of relationships.

Existing large-scale knowledge graphs, such as Wikidata, Yago, and DBpedia, store massive amounts of world knowledge in a structured form.

The following figure is used as an example for a more intuitive explanation. This chart visually and intuitively presents people's intricate social relationships. Is it easier to understand than a bunch of sentences or paragraphs?

Since Google introduced Google Graph into the search engine in 2012, the knowledge graph has attracted a great deal of attention from the academia and the industry, and gradually these structural knowledge results are widely used in search engines, question answering systems, finance and other fields. Knowledge graph related research and application development companies have also sprung up, with Palantir, Kensho, GRAKN.AI, etc. abroad, BAT in China, and start-up companies such as Daguan Data, Haizhixingtu, PlantData, Zhiyan Technology, and us Today's protagonist-Minglue Technology.

Of course, some knowledge graph companies are doing well, and some companies are struggling to survive due to various reasons such as lack of core technology, and Mingluo data is very successful in many similar companies. In March 2019, it received 2 billion yuan in D round of financing. Later, Minglue Data was upgraded to Minglue Technology Group.

What is the secret of this company in the knowledge graph industry? Today, we through Zhang Jie , Director of the Knowledge Engineering Laboratory of the Academy of Sciences of Mingluo Technology Group , have a detailed understanding of the core technology of Mingluo knowledge graph construction, as well as the current status and future development trends of knowledge graph technology and industry.

"Three teams stand up", creating a core knowledge map

After graduating as a Ph.D., Zhang Jie joined the Central Research Institute of Huawei for 6 years of research. With Huawei's deep development in the communications industry, it gradually entered the "no man's land" and participated in the recommendation and personal assistant scenarios in the ICT industry.

In 2014, he realized that the financial industry will be the next hotspot for the application of big data and artificial intelligence technology. The two most important links, risk control and customer acquisition, will bring significant incremental value, so he chose to join a Fintech startup team. He recalled that during that time, he had to visit customers to understand the market needs, but also to do technical research, and exercised his ability to lead the team to find the best balance between R & D investment and commercial value.

In 2019, Zhang Jie is convinced that AI technology will profoundly transform many traditional industries in the future, and Mingluo has both technology and talent accumulation on the To B track and practical experience in multiple industries, so he chose to join Minglue. After coming here, Zhang Jie focused on the field of knowledge engineering technology, doing industry prospective research in the two links of automatic construction of knowledge graph and assisted decision-making of knowledge graph.

According to Zhang Jie, the technical power of Minglue Knowledge Graph consists of three parts: Academy of Sciences, Technology Center and Product Center . The technical capabilities are shared throughout the group, and the key technical achievements of the Academy of Sciences will be exported to the technology center. The technology center will form a company-level reusable component and be delivered to the product center. The product center will condense the baseline version of the knowledge graph, and Adapt and optimize for various industries and be responsible for delivery quality and customer satisfaction. The core technical backbone of the R & D team comes from Tsinghua University, Peking University, Carnegie Mellon University and other well-known universities at home and abroad, and many members have practical experience in Fortune 500 companies such as IBM, NEC, Oracle, Schlumberger and so on.

In this way, the three teams behind Mingluo Knowledge Graph Technology are in a "stand-up" trend and jointly support the task of Mingluo Knowledge Graph Technology and product construction.

Knowledge Graph Technology and Application Status

Minglue was established in 2014. In 2017, it completed 1 billion yuan in financing and became a unicorn in the field of big data. It has been involved in the field of knowledge graph for a long time, and has a deep understanding of the development of technology and industry.

From the professional perspective of Zhang Jie, the knowledge graph has been very hot in recent years, but to put it bluntly, the academic community mainly focuses on two directions: one is knowledge representation based on deep learning, and the other is knowledge graph + (eg: knowledge graph + retrieval , Knowledge graph + recommendation, knowledge graph + pre-trained language model).

The industrial community mainly focuses on the automatic construction of graphs, for example: how to automatically map from a structured database to a knowledge graph and do knowledge fusion, how to do chapter-level event extraction and multi-event correlation from unstructured text, and how to learn through small samples And the technology of domain knowledge transfer reduces the cost of manual annotation, and the application of knowledge representation based on deep learning in various links and so on.

This is the basic situation of knowledge graph technology and application.

What did Mingluo do?

In such an environment, there are countless companies that make knowledge graphs, but in fact many companies are doing homogenous products and functions. What special things did they do? What core technologies do you have to stay alive in the big data companies?

Core Products

It is reported that Mingluo launched the "HAO Intelligence" technology architecture of human, machine and organization in 2018, in which: H stands for Human Intelligence, A stands for Artificial Intelligence, and O stands for Organization Intelligence. ). The goal of HAO intelligence is to make people and machines into a unified organization through this theoretical system. Human intelligence and machine intelligence complement each other to achieve organizational intelligence.

At present, Mingluo Technology Group has developed a number of knowledge graph products, covering all links from raw data acquisition to application display. among them:

  • CONA (Connect All the data), that is, "associated all data", is a general governance platform for structured data, which can collect, clean, classify, and associate all structured data on a large scale with automation to form a unified data view. In addition, by setting data conversion rules, combining data multi-value traceability and fusion strategies, standardization benchmarking can be automatically completed, data governance automation can be achieved, and the efficiency of industry knowledge map construction can be greatly improved. Taking actual data governance in the field of public safety as an example, there are nearly a thousand tables in the business system. The traditional methods and tools for graph construction may take more than half a year, and CONA can be shortened to 2 weeks.

  • NEST is a self-developed knowledge graph database, which uses hybrid data storage technology to support second-level responses of hundreds of millions of entities and billions of edges.

  • SCOPA is a visual data analysis platform built on NEST. According to the characteristics of business scenarios and data maps, it provides powerful functions such as relational network analysis, space-time trajectory collision, real-time multi-dimensional retrieval, information comparison collision, intelligent collaboration system, and real-time data access. , Making the rapid development of knowledge graph industry solutions possible. It has been applied to hundreds of projects in many industries including public safety, finance, taxation and industry.

Compared with other enterprises, what are the unique features of Mingluo in the underlying technology of knowledge graph construction? What are the specific implementation details?

Zhang Jie explained that in the process of automating the construction of knowledge graph, Mingluo Technology has accumulated some core technologies in the following links:

  • For structured and semi-structured data, HAO profiling technology is proposed: when structured / semi-structured data from different data sources are aggregated and organized, the data is tried to understand the data and solve problems such as data redundancy and conflicts. Standardize and connect the data to form a knowledge graph of the data, display it visually, and serve query, calculation and other application needs with a unified view.

  • For unstructured data, an algorithm toolkit was designed and developed: HAO Atlas. HAO graphs include relation extraction, event extraction, entity alignment, embedded representation of network structure, time-space sequence data representation, graph summary, graph-based short text generation and other algorithms, focusing on the development of service enterprise-level knowledge graph systems. It can run independently, or it can be delivered to the enterprise technical team for secondary development.

According to Zhang Jie, the core products of Mingluo ’s knowledge graph have been repeatedly polished and optimized for 6 years before they are finally used in advertising, marketing, public safety, industry, finance, digital cities, supply chain, catering and other industries. During the technological R & D iteration, Zhang Jie shared the most valuable experience of Mingluo during the technological iteration.

In a word, his experience is how to deal with "dedicated, reused, and universal". In order to improve the human-to-efficiency ratio in the implementation of the project, on the one hand, on the organizational process, we will ensure that the experience in the project can be "returned" to the technical center, on the other hand, strengthen the role of machine learning technology in the "internalization" of products, such as The HAO profiling technology and CONA platform mentioned above. Although facing many industry customers, the business systems of different companies in the same industry are similar in function, data structure, and business logic.

One of the secrets of Minglue's success is that it attaches great importance to industry experience reuse and technical tool reuse. In terms of reuse of industry experience, Mingluo established an industry consulting team, formed industry-oriented best practices and successful cases, and invested in the formulation of national standards, industry standards, and alliance standards.For example, in 2018, Mingluo Technology and The First Research Institute of the Ministry of Public Security jointly released the industry's first "White Paper on Standardization of Public Security Knowledge Map". Regarding the reuse of technical tools, technical capabilities are shared throughout the group. Algorithms, common technical components, product iterations, and project delivery all have their own division of labor and collaboration.

Technology research and development is important, but maximizing the effectiveness of existing results is a shortcut with less effort.

Automated machine learning modeling for knowledge graph

Another innovation of Minglue is the application of AutoML technology, which has been in flames in recent years, to the construction of knowledge graphs.

Zhang Jie explained that in the scenario of determination, convergence, and sufficient data volume, in order to reduce the repeated labor of manual training models, the method of automated machine learning is clearly adopted. The model training platform MatrixAI is built for machine learning modelers and can be automatically given. Multi-dimensional data exploration report, based on which to find similar tasks on historical data sets and model sets, give algorithm selection suggestions and super parameter suggestions according to the best practice of similar tasks, and then automatically evaluate model performance and automatically adjust parameters.

This is an idea to find the optimal solution near the optimal solution of similar tasks. However, when faced with new tasks in new fields, Zhang Jie does not recommend the direct use of automated machine learning. The purely "replace intelligence with computing power" approach is not effective and wastes computing resources.

Research Direction of Knowledge Graph Hotspot and Current Status of Underlying Technology

Knowledge graph query

Knowledge graph query inference is an important research point of knowledge graph, and it is also a difficult problem to be solved. Where is the problem? What attempts did Mingluo make about this?

Zhang Jie replied that, at present, Mingluo Technology's reasoning based on knowledge graph is mainly on two issues: multi-hop relationship prediction and counterfactual prediction. The challenge of these two problems is that the prediction accuracy of the expert rule method is not enough, and the data volume of the data-driven method is not enough. The ToB business application scenario is clearly done, which requires high accuracy for the final result and requires interpretability. . Therefore, Mingluo tried the way of human-computer collaboration and human-computer interaction. First, on the basis of the preliminary causality diagram given by experts, the causality between events was further supplemented by data-driven methods to form an industry causality map, and then Both expert experience and models for specific scenarios and specific tasks are encapsulated as operators. After multiple rounds of human-computer interaction, experts give the final answer.

He predicts that this direction will try to apply in industries with higher complexity, reducing the dependence of application scenarios on industry experts.

Knowledge base

On the other hand, building a common sense knowledge base is an important way to make the knowledge graph "smart". In constructing the common sense knowledge base, Minglue has made some attempts.

Zhang Jie said that the construction of the common sense knowledge base also needs the help of research institutions and open source, and the focus of Mingluo Technology ’s future planning is to build multiple vertical domain knowledge bases, solidify the domain facts and domain laws, and implement the project. In order to continue to accumulate and modify.

Currently, is the underlying technology ecosystem of the knowledge graph perfect? Is the surrounding tool support comprehensive?

In Zhang Jie's view, at present the various links of the underlying technology of knowledge graphs are still not perfect, and many links require a certain degree of manual participation, such as: the definition of the graph schema, the development of data mapping rules, the development of common sense or domain knowledge bases, training data sets Labeling, manual verification at the knowledge fusion stage, etc. To be commercially viable, the degree of automation must be at least 95%, and some scenarios require even higher levels.

Moreover, the industry currently does not have a comprehensive set of tools for enterprise-level applications in the field of knowledge graphs. In response to this situation, Mingluo has developed the above mentioned set of algorithm toolkit for graph construction-HAO graph. However, a comprehensive tool set such as HAO Graph is rare in the knowledge graph industry, but it also shows that this is a gap in the underlying technology ecology of the knowledge graph to be tapped, and it is also an opportunity.

Zhang Jie believes that in the future, there are still many areas that need to be improved for the underlying technology of the knowledge graph. In addition to the technical methods, Zhang Jie believes that it can also consider ways to promote the standardization of industry alliance data, transform existing IT systems, and label the work from labeling for labeling Become a crowdsourcing method.

Prospects for future technology development trends

In summary, we can conclude that the development of technology and applications in the field of knowledge graphs is still a "blue ocean" to be developed, and immature places breed opportunities and potential. In the future, what are the development directions of knowledge graph technology? Zhang Jie pointed out some of the points to be improved:

He said that most technologies related to knowledge graphs are still open, such as:

  • At the information extraction level, you can do chapter-level or even cross-section event extraction;

  • At the level of knowledge representation, a more general way is needed to express knowledge about the semantic information, network structure information, and timing information contained in nodes and edges;

  • At the application level, what is urgently needed to be broken through is a data-driven approach to causality discovery and causal inference.

The knowledge graph connects all kinds of knowledge in the world to form a logical and structured knowledge base of human-like brain, so as to implement unified standards in human practice and provide personalized services to improve work efficiency. However, there is still a gap between the knowledge graph and becoming an indestructible steel knife. In order to use it to achieve the ultimate goal of humanity-convenience, developers need to continue to work hard!

Interview guests

Dr. Zhang Jie is the director of the Knowledge Engineering Laboratory of the Academy of Sciences of Mingluo Technology Group. His research direction is machine learning, natural language processing, and knowledge graph. He used to work in the Huawei Noah ’s Ark Laboratory, and then as a co-founder, he founded a financial technology company and served as CTO He has presided over the construction of encyclopedia knowledge quiz, dialogue robot, recommendation engine, decision engine, big data risk control and other systems, published more than ten academic papers, and more than 80 invention patents.

Selected readings of "AI Technology Ecology" series:

Today's benefits

Leave a message to see Lu Qi!

Also as an important part of "One Million People Learn AI", the 2020 AIProCon Developer Ten Thousand Conference will be broadcast live online from July 3rd to 4th, allowing developers to learn about the cutting-edge technology of AI in one stop Research, core technology and application, and practical experience of enterprise cases, and can also participate in exciting and diverse developer salons and programming projects online. Participating in the forward-looking series of activities and online live broadcast interaction, not only can communicate with tens of thousands of developers, but also have the opportunity to win exclusive gifts for live broadcasts, and even join the technology giant.

A comment in the comment area is selected and you can get an online live ticket for the "2020 AI Developer Ten Thousand Conference" worth 299 yuan . Come and move your finger and write what you want to say.

Click to read the original text and go directly to the official website of the conference.

Published 1951 original articles · 40,000+ praises · 18.16 million views

Guess you like

Origin blog.csdn.net/csdnnews/article/details/105501897
Recommended