Regarding LLM, knowledge graphs, and graph databases, what issues are everyone concerned about?

12.23 Yuan Chuanghui·Shanghai Station, let’s talk about LLM infrastructure

From the LLM series of articles "Llama Index, a large language model driven by knowledge graphs" , "Text2Cypher" ：Graph query generation driven by large language model》, 《Graph RAG: Knowledge graph combined with LLM retrieval enhancement》 After meeting with everyone one after another, and having a live broadcast on the theme of "LLM Night Talk" to discuss LLM, knowledge graphs, and graph databases with everyone, after last week's NebulaGraph R&D staff were guests at the Open Source China·Expert Q&A event, and shared the current thinking and practice of NebulaGraph in LLM with everyone in Open Source China.

At this time, more than half a year has passed since the launch of ChatGPT. Let’s take a look at what inspiration and practical benefits this LLM technology can bring to everyone after the heat dissipates.

LLM keynote guest

Two NebulaGraph developers are participating in this Q&A with experts:

Gusi is: [GitHub ID @wey-gu](https://github .com/wey-gu), NebulaGraph evangelist, he is the first person to propose the concept of Graph RAG in the LlamaIndex community;
Cheng Xuntao: [GitHub ID @xtcyclist](https://github.com/xtcyclist ), the core developer of NebulaGraph, is engaged in the development of graph databases and is currently committed to better integrating graph databases with LLM.

Problem collection

What exactly is an LLM?

iman123 asked: Hello teacher, LLM is very popular now. What I understand is that LLM is actually based on existing knowledge and data. When gathered together, it can give you somenon-creative ideas. Answers and suggestions, for example, you cannot let him discover or create unknown science. I don’t know if my understanding is correct. In fact, LLM can replace some repetitive manual customer service work and improve some work efficiency in the future. Programmers may not be completely replaced. If you can write code, debug code, and run code by yourself, it will be like in The Matrix.

wey-gu: Indeed. However, the analysis and debugging of written code can be done with the help of Copilot and Cursor AI auxiliary tools can already be smarter and smoother than imagined. Here is an example: @xtcyclist proposed a NebulaGraph kernel change. I used these auxiliary tools to find where to make the changes in NebulaGraph and how to make the changes in a few minutes. For example, generate test code, refer to: https://vimeo.com/858182792

Someone gmgn3 asked: Hello teacher, what are the advantages of large language model LLM?

wey-gu: The advantage is that it is a perceptual layer with relatively sufficient general knowledge and the ability to solve domain problems (context learning, search enhancement) given sufficient context. However, given sufficient, relevant and accurate context Sometimes it is difficult, and at this time the knowledge graph can help.

clearsky1991 asked: LLM is very popular now. Can you deploy some for local use? What are the computer configuration requirements? Are there any open source free projects similar to ChatGPT 4 that you can recommend for personal local use?

wey-gu: Yes, for example, ChatGLM2-6B can run on the CPU after quantization. Here is an example of me using ChatGLM2-6B and local Embedding model to do LLM + Graph. You can try it first: https://www.siwei.io/demo-dumps/local-llm/Graph_RAG_Local .htm

LLM and knowledge graph

Pulling your pants and pocketing the question: Can the large language model LLM help extract key information from the analyzed data to generate graph data? How to land?

wey-gu: Of course, you can use LLM to extract and build KG. Here is a demo: https://www.siwei.io/demos/text2cypher/ a> discusses such an idea. REBEL: Relation Extraction By End-to-end Language generation. We can even further combine the LLM + NLP model to do this on the basis of extracting knowledge, such as: paperhttps://www.siwei.io/demo-dumps/kg-llm/KG_Building.ipynb and

Nan Xiaoshan Programmer asked: Hello teachers, what is the correlation or similarity between the large language model LLM and the knowledge graph? I feel that there are many similarities between the two. For example, the knowledge graph aims to capture the semantic relationships of the world and provide an effective way to query and reason about the knowledge about the relationships between entities, and the large language model is also largely the same. semantic relationships and semantic understanding. What are the common points and the biggest differences between the two?

xtcyclist: The knowledge graph carries semantics, but it does not capture semantic relationships. It captures various concepts and their relationships, that is, the relationship between knowledge and knowledge. There is still a difference between knowledge, language and semantics. Language is a carrier of knowledge. The big language model is a language model, which itself is not capable of managing knowledge and the relationships between knowledge. That’s why there is a need to use vector databases and graph databases to manage domain knowledge in the LLM stack.

Elven_Xu asked: I have a scenario question to ask about knowledge graph and LLM. Your answer above also talks about the relationship between the two. Knowledge graph is more important than management knowledge and the relationship between knowledge, while LLM is more important than knowledge. itself, but the relationship between knowledge and knowledge can also be managed through vector databases. I don’t know if I understand this right? If my understanding is correct, does it mean that LLM can replace the knowledge graph? If we turn to LLM now, can we cut off the knowledge graph? Or is it only partially fungible? I don’t quite understand, I want to ask the teacher for advice, thank you~

wey-gu: LLM and KG/Graph are mutually beneficial, and no one can replace the other:

In the application of LLM + Data/Knowledge (contextual learning), in scenarios such as fine-grained data segmentation and understanding of domain knowledge, the introduction of KG can greatly alleviate illusions, enhance result search, and improve the effect of intelligent applications;
One of the obstacles to the application of KG is writing queries. Text2GraphQuery has become very, very cheap and efficient after LLM;
LLM can help a lot during the construction process of KG

Some of my previous sharings, articles, and sample codes mentioned the three scenarios where the two help each other. You can pay attention to them.

LLM’s business practices

Bayi Chopper asked: In the graph database, data such as relationships, nodes, attributes, etc. are reflected. When the application layer obtains data, it is mainly obtained through GQL statements. So in the process of combining it with the LLM large model, how to combine it? For example, for search scenarios, if user input is converted into GQL statements through NLP, the scope seems to be too wide (user input is all kinds of weird) and cannot be focused. Do you have any good processing experience?

wey-gu: Simply speaking, there are two ideas, Text2Cypher and Graph RAG. The former directly converts the question into the graph query language Cypher, while the latter extracts the key information from the question, searches the subgraph in the knowledge graph, and then constructs the context to let LLM generate the answer. The problem has been broken down into subsequent smaller problems through some methods (such as Chain-of-Thought). You can take a look at the specific implementation: https://www.siwei.io/graph-rag/ or https ://colab.research.google.com/drive/1tLjOg2ZQuIClfuWrAC2LdiZHCov8oUbs?usp=drive_open#scrollTo=iDA3lAm0LatM , and I also made a small course: https://youtube.com/watch?v=hb8uT-VBEwQ&t=2797s&pp=ygU

lvxb asked: Can LLM be used in short text classification and recognition judgment? Are there any actual cases?

xtcyclist:: Of course, text processing is certainly what large language models are best at. My PhD team recently created a public account for "Mei Tou 365". They used LLM to analyze U.S. stock data and financial news, both long and short, and then generated commentary articles, which included text classification.

Technical advantages and disadvantages of graph databases

iman123 asked: I have come into contact with the graph database Neo4j before. What are the advantages and disadvantages of NebulaGraph in comparison?

wey-gu: Regarding the graph database Neo4j and NebulaGraph, NebulaGraph can be said to have some latecomer advantages. The latter is designed by our founding team based on years of accumulation of graph storage systems and using new storage engineering methods and practices for distributed and ultra-large-scale data. Therefore, for scenarios with large graphs, high availability, and high concurrency, or scenarios where business graphs are expanding, NebulaGraph can be used to scale naturally. Secondly, NebulaGraph is open source based on Apache 2.0, which supports distributed deployment.

xiaour asked: I used graph databases a few years ago when I was working on the AI Music APP. However, I found that in pursuit of ultimate performance and efficiency, the graph databases on the market have some bottlenecks and often require a large investment of resources or user Tolerating response delays; how do we deal with the conflicting costs and benefits of investing in graph databases?

wey-gu: You can come toNebulaGraph community to talk about your bottlenecks. This project is better at online high-concurrency scenarios. Many domestic Used by major social and lifestyle companies, the distributed design makes it possible to increase the amount of data without having to worry too much about scale issues. As a new system, Gallery must have a certain cost of talent investment. However, this ROI problem has undergone some qualitative changes after the introduction of LLM:

Building KnowledgeGraph made easy;
Querying a KnowledgeGraph (either human or machine) could become very easy.

But in general, if ROI makes sense in the scene, it is highly recommended to try adding the gallery, which can open up many potential possibilities. Imagine that you can obtain multi-hop correlations on the graph in real time, use some visualization tools to gain intuitive insight into data relationships, and then perform some algorithms on the graph to obtain new features and conclusions, etc.

Q: How can the combination of graph database and big data framework computing engine be better utilized in terms of efficiency or the complementary advantages of graph algorithms?

wey-gu: The advantage of the image gallery isreal-time, graph query, and small amount of calculation Flexible expression, the disadvantage is that it is not good at operations involving the entire graph or part of the entire graph data. On the contrary, the graph computing platform is suitable for full graph access, iteration, and calculation tasks. However, by default, the real-time nature of graph computing platform data is a shortcoming (data is often pulled from the data warehouse). An example of combination is that the computing platform serves as the computing layer, and the storage layer selects the library on demand. With a storage-computation separation architecture like NebulaGraph, the graph computing platform is a very smooth combination even if it is the heterogeneous computing and query layer within the cluster.

For example, using the NebulaGraph enterprise version NebulaGraph Explorer + NebulaGraph Analytics, we can use the API or the WYSIWYG interface in the browser to arbitrarily plan complex calculation task pipelines on the graph. At the bottom of this system, we can choose gallery-based queries as needed, or bypass the query layer and directly scan the entire graph from the bottom of the database to perform graph calculation tasks.

Another example is that GNN trains the Inductive model on the entire graph, and then in online business, extracts the subgraph of relevant new insertion points (such as 3,000 points) from NebulaGraph in real time, and then uses it as input to the model To reason and obtain prediction results is also a typical case of combining GNN + image gallery. The example project is here:https://github.com/wey-gu/NebulaGraph-Fraud-Detection-GNN/

references

If you are interested in LLM related practices, you can read the following manuscript to learn more:

Application paper of graph database in CAE field:A Graph-based Approach to Manage CAE Data in a Data Lake
Survey of Large Language Models：https://arxiv.org/abs/2303.18223
"Review of Large Language Models":https://github.com/RUCAIBox/LLMSurvey/blob/main/assets/LLM_Survey__Chinese_V1.pdf
Siwei’s LLM thinking and practice:https://www.siwei.io/categories/llm/

Thank you for reading this article (///▽///)

Graph database NebulaGraph Graph for LLM project is recruiting interns, JD portal:Database kernel development engineer (large model direction)