The server was overwhelmed, and ChatLaw, a large legal model of Peking University, became popular: directly tell you how Zhang San was sentenced!

The large language model continues to expand to the vertical industry field, and this time it is the legal model of Peking University.

The big model "exploded" again.

Large model research test portal

GPT-4 capability research portal (advanced/continue to visit in case of browser warning):
https://gpt4test.com

Last night, a large legal model, ChatLaw, topped Zhihu's hot search list. At its peak, it reached about 20 million.

This ChatLaw is released by the Peking University team, which is committed to providing inclusive legal services. On the one hand, there is currently a shortage of practicing lawyers across the country, and the supply is far less than the legal demand; on the other hand, ordinary people have a natural gap in legal knowledge and provisions, and cannot use legal weapons to protect themselves.

The recent rise of big language models just provides an excellent opportunity for ordinary people to consult legal-related issues in a conversational manner.

Currently, there are three versions of ChatLaw, as follows:

  • ChatLaw-13B is an academic demo version, based on Jiang Ziya's Ziya-LLaMA-13B-v1 training, and the Chinese performance is very good. However, legal questions and answers with complex logic are not effective and need to be solved with a model with larger parameters;

  • ChatLaw-33B, also an academic demo version, is based on Anima-33B training, and its logical reasoning ability has been greatly improved. However, due to the lack of Chinese corpus in Anima, English data often appears in Q&A;

  • ChatLaw-Text2Vec, using a data set made of 93w judgment cases, trained a similarity matching model based on BERT, which can match the user's question information with the corresponding legal articles.

According to the official demo, ChatLaw supports users to upload legal materials such as files and recordings, helping them to summarize and analyze, and generate visual maps, charts, etc. In addition, ChatLaw can generate legal advice, legal documents based on facts. The number of Stars on GitHub reached 1.1k.

Official website address :
https://www.chatlaw.cloud/

Paper address :
https://arxiv.org/pdf/2306.16092.pdf

GitHub address :
https://github.com/PKU-YuanGroup/ChatLaw

At present, due to the popularity of the ChatLaw project, the server temporarily crashed, and the computing power has reached the upper limit. The team is working on a fix, and interested readers can deploy a beta model on GitHub.

The editor himself is still in the queue for internal testing. So here is an example of an official conversation provided by the ChatLaw team, about the "seven days no reason to return" problem that may be encountered in daily online shopping. I have to say that ChatLaw's answer is quite comprehensive.

However, the editor found that the academic demo version of ChatLaw can be tried out. Unfortunately, it does not have access to the legal consultation function, and only provides simple dialogue consultation services. Here are a few questions I try to ask.

In fact, Peking University is not the only one that released a large legal model recently. At the end of last month, Powerlaw Intelligence and Zhipu AI jointly released PowerLawGLM, a legal vertical model with hundreds of billions of parameters. It is reported that the model has shown unique advantages in the application of Chinese legal scenarios.

Source: Power Law Intelligence

Data source and training framework of ChatLaw

The first is data composition . ChatLaw data is mainly composed of forums, news, legal articles, judicial interpretations, legal consultation, legal examination questions, and judgment documents, and then the dialogue data is constructed after cleaning and data enhancement. At the same time, by cooperating with Peking University School of International Law and well-known law firms in the industry, the ChatLaw team can ensure that the knowledge base can be updated in a timely manner while ensuring the professionalism and reliability of the data. Let's look at a specific example below.

Construction examples based on laws and regulations and judicial interpretations:

Example of grabbing real legal consulting data:

An example of constructing a bar exam multiple-choice question:

Then there is the model level. To train ChatLAW, the research team fine-tuned Ziya-LLaMA-13B using Low-Rank Adaptation (LoRA). In addition, the study also introduces the role of self-suggestion to alleviate the problem of model hallucinations. The training process is performed on multiple A100 GPUs, and deepspeed further reduces the training cost.

The following figure is the architecture diagram of ChatLAW. The research injects legal data into the model, and specially processes and strengthens the knowledge; at the same time, they also introduce multiple modules during reasoning, integrating the general knowledge model, professional model and knowledge base as one.

The research also places constraints on the model in its inference, so as to ensure that the model generates the correct laws and regulations and minimize model illusion.

Initially, the research team tried traditional software development methods, such as MySQL and Elasticsearch for retrieval, but the results were not satisfactory. Therefore, the research began to try to pre-train the BERT model for embedding, and then use methods such as Faiss to calculate the cosine similarity and extract the top k laws and regulations related to user queries.

This approach often produces suboptimal results when the user's question is ambiguous. Therefore, researchers extract key information from user queries and use the vector embedding of this information to design algorithms to improve matching accuracy.

Since large models have significant advantages in understanding user queries, this study fine-tunes the LLM to extract keywords from user queries. After obtaining multiple keywords, the study uses Algorithm 1 to retrieve relevant legal provisions.

Experimental results

The study collected more than ten years of national judicial examination questions, and compiled a test data set containing 2,000 questions and their standard answers to measure the ability of the model to deal with legal multiple-choice questions.

However, the study found that the accuracy of each model was generally low. In this case, it doesn't make much sense to compare accuracy alone. Therefore, this study draws on the ELO matching mechanism of League of Legends to create a model-confrontation ELO mechanism in order to more effectively evaluate the ability of each model to deal with legal multiple-choice questions. The following are the ELO scores and winning percentage charts:

Through the analysis of the above experimental results, we can draw the following observations

(1) Introducing law-related questions and answers and data from regulatory provisions can improve the performance of the model on multiple-choice questions to a certain extent;

(2) Add data of a specific type of task for training, and the performance of the model on this type of task will be significantly improved. For example, the reason why the ChatLaw model is better than GPT-4 is that a large number of multiple-choice questions are used as training data;

(3) Legal multiple-choice questions require complex logical reasoning. Therefore, models with larger parameters usually perform better.

References

[1]https://www.zhihu.com/question/610072848
[2]https://mp.weixin.qq.com/s/bXAFALFY6GQkL30j1sYCEQ

Large Model AI Full Stack Handbook

The industry's first AI full-stack manual is open for download! !

Up to 3,000 pages, covering AI directions such as the development of large language model technology, the latest trends and applications of AIGC technology, and deep learning technology. WeChat public account pays attention to "Xi Xiaoyao Technology Talk", reply "789" to download materials
[picture]

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/131553644