DISC-LawLLM: The Fudan University team released a Chinese smart legal system, built a judicial evaluation benchmark, and open sourced 300,000 fine-tuned data...

Background introduction

With the rise of smart justice, smart legal systems driven by smart methods can benefit different groups. For example, easing paperwork for legal professionals, providing legal advisory services to the general public, and providing study and exam coaching for law students.

Due to the uniqueness of legal knowledge and the diversity of judicial tasks, previous research on smart justice mainly focused on designing automated algorithms for specific tasks, which was difficult to meet the demand for supporting services in the judicial field and was far from being implemented. Recently, large language models (LLMs) have demonstrated powerful capabilities on different traditional tasks, bringing hope for the further development of intelligent legal systems.

Fudan University Data Intelligence and Social Computing Laboratory (FudanDISC) released a Chinese smart legal system driven by a large language model - DISC-LawLLM. The system can provide a variety of legal services to different user groups. In addition, an evaluation benchmark DISC-Law-Eval was constructed to evaluate the large legal language model from both objective and subjective aspects. The performance of the model in the evaluation has obvious advantages over existing large legal models.

The research team also released a high-quality supervised fine-tuning (SFT) data set - DISC-Law-SFT containing 300,000, and the model parameters and technical reports were also open source.

Home page address:

https://law.fudan-disc.com

Github address:

https://github.com/FudanDISC/DISC-LawLLM

Technical Reports:

https://arxiv.org/abs/2309.11325

Enter the NLP group—> Join the NLP exchange group

DISC

01

Sample display

When users have legal questions, they can consult the model and describe the question, and the model will provide relevant legal regulations and explanations, recommended solutions, etc.

7f454cdcdf86509556c2c02b73ab4d1e.gif

Figure 1 Example of legal consultation

Professional lawyers and judicial authorities can use the model to complete legal text summarization, judicial event detection, entity and relationship extraction, etc., to reduce paperwork and improve work efficiency.

8ce47e16d8b96fbab5291951ca8f9b7a.gif

Figure 2 Analysis of judicial documents

In the process of preparing for the judicial examination, law students can ask questions to the model to help consolidate legal knowledge and answer legal examination questions.

c54b71d02d620132536df728c05ab0d2.gif

Figure 3 Example of exam assistant

When support from external legal provisions is needed, the model will search relevant content in the knowledge base based on the question and give a reply.

fe522203a6840c99384fbbf7d45458ad.gif

Figure 4 Retrieval of dialogue in enhanced scenario

02

Introduction to DISC-LawLLM

DISC-LawLLM is a large legal model based on the high-quality data set DISC-Law-SFT we constructed and fine-tuned with full parameter instructions on the general-domain Chinese large model Baichuan-13B. It is worth noting that our training data and training methods can be adapted to any base large model.

DISC-LawLLM has three core capabilities:

1. Basic legal text processing ability. In view of the different basic capabilities of legal text understanding and generation, including information extraction, text summarization, etc., we constructed fine-tuned data based on existing NLP judicial task public data and real-world legal-related texts.

2. Legal reasoning thinking ability. In response to the needs of tasks in the smart judicial field, we used legal syllogism, the basic legal reasoning process of judges, to reconstruct the instruction data, effectively improving the legal reasoning ability of the model.

3. Ability to retrieve and follow knowledge in the judicial field. Problem solving in the smart judicial field often requires following background laws or cases related to the problem. We have equipped the smart legal processing system with a retrieval enhancement module, which strengthens the system's retrieval and compliance capabilities for background knowledge.

   The overall framework of the model is shown in Figure 5.

22de2b70792ecb6ea488f47a59a8df28.jpeg

Figure 5 Model serves different users in different legal scenarios

03

method:

Construction of the data set DISC-Law-SFT

cf69d2f6b76e87feb62a6f1f3705c70a.jpeg

Figure 6 Structure of DISC-Law-SFT

DISC-Law-SFT is divided into two sub-datasets, namely DISC-Law-SFT-Pair and DISC-Law-SFT-Triplet. The former introduces legal reasoning capabilities to LLM, while the latter helps to improve the model's use of external knowledge ability.

74019012ea6231c7988acd9b119a2449.jpeg

Table 1: Introduction to the contents of the DISC-Law-SFT data set

Data Sources

The data of the DISC-Law-SFT data set comes from three parts. The first is a public data set of NLP judicial tasks related to Chinese law, including legal information extraction, entity and relationship extraction, judicial text summary, judicial examination questions and answers, judicial reading comprehension, Crime/sentence prediction, etc.; the second is to collect legal-related original texts from the real world, such as laws and regulations, judicial cases, judgment documents, judicial-related examinations, etc.; the third is a general open source data set, we used alpaca_gpt4_data_zh and Firefly , which can enrich the diversity of the training set and reduce the risk of model degradation in basic capabilities during the SFT training phase.

Instruction pair construction

After converting the data from the first and second sources above into "input-output" instruction pairs, we use the following three methods to reconstruct the instruction data to improve data quality.

behavior shaping

In a legal syllogism, the major premise is the applicable legal rules, the minor premise is the facts of the case, and the conclusion is the legal judgment. This constitutes a basic legal reasoning process for judges. Each case can be led to a clear conclusion through a syllogism, as follows:

Major premise: legal rules

Minor premise: facts of the case

Conclusion: Legal Judgment

We use GPT-3.5-turbo to complete the reconstruction of behavior shaping, refine the output, and ensure that each conclusion is drawn from a legal clause and a case fact.

knowledge expansion

For multiple-choice questions where behavior shaping is not applicable, we extend the output directly with legal knowledge to provide more reasoning details. Many law-related exams and knowledge competitions only provide answer options, we use LLM to expand the legal knowledge involved, give correct answers, and reconstruct instruction pairs.

Thought cultivation

Chains of Thoughts (CoT) have been proven to be effective in improving the model’s reasoning capabilities. In order to further empower the model with legal reasoning capabilities, we designed a chain of thinking with specific legal meaning, called LCoT, which requires the model to use legal syllogisms to derive answers. LCoT converts input X into a prompt like this:

In a legal syllogism, the major premise is the applicable legal rules, the minor premise is the facts of the case, and the conclusion is the legal judgment of the case.

Case:X

Let’s think about and output judgments in terms of legal syllogisms:

Instruction triple structure

In order to train the retrieval-enhanced model, we constructed the DISC-Law-SFT-Triplet sub-dataset. The data is a triplet in the form of <input, output, reference>. We use the three strategies listed in the instruction pair construction to Raw data are processed, inputs and outputs are obtained, and heuristic rules are designed to extract reference information from the raw data.

04

experiment

train

The training process of DISC-LawLLM is divided into two stages: SFT and retrieval enhancement.

Search enhancement

Although we use high-quality instruction data to fine-tune the LLM, it may produce inaccurate responses due to hallucinations or outdated knowledge. To solve this problem, we designed a retrieval module to enhance DISC-LawLLM.

Given a user input, the retriever returns the most relevant Top-K documents from the knowledge base by calculating their similarity to the input. These candidate documents, together with user input, are constructed using the template we designed and then input into DISC-LawLLM. By querying the knowledge base, the model can better understand the main premises, resulting in more accurate and reliable answers.

8aff060c516ea4cfbc837b4c11ab4663.jpeg

Figure 7: Retrieval enhanced DISC-LawLLM

Review

Evaluation benchmark DISC-Law-Eval

We constructed a fair smart legal system evaluation benchmark DISC-Law-Eval, which evaluates from both objective and subjective perspectives, filling the gap that there is currently no benchmark to comprehensively evaluate smart legal systems.

1a7761eea877df09bdfdf70cec74e48b.jpeg

Figure 8: DISC-Law-Eval evaluation benchmark

Objective evaluation

In order to objectively and quantitatively evaluate the legal knowledge and reasoning capabilities of intelligent legal systems, we designed an objective evaluation data set, consisting of a series of single-item and multiple-choice questions from Chinese legal standardized examinations and knowledge competitions, and based on content complexity and deductive difficulty, which divides questions into three levels: difficult, normal and easy. It can provide a more challenging and reliable way to measure whether the model can use its knowledge to reason about the correct answer. We demonstrate performance by calculating accuracy.

subjective evaluation

For the subjective evaluation part, we use a question-and-answer paradigm for evaluation, simulating the process of subjective exam questions. We hand-constructed a high-quality test set from legal consultations, online forums, justice-related publications, and legal documents. We use GPT-3.5-turbo as a referee model to evaluate the model's output and provide a score from 1 to 5 using three criteria: accuracy, completeness, and clarity.

Evaluation results

Compare models

Compare our model DISC-LawLLM (without external knowledge base) with 4 general LLMs and 4 Chinese legal LLMs, including GPT-3.5-turbo, ChatGLM-6B, Baichuan-13B-Chat, Chinese-Alpaca2-13B; LexiLaw, LawGPT, Lawyer LLaMA, ChatLaw.

Objective evaluation results

DISC-LawLLM outperforms all compared large models with the same number of parameters in all tests at different difficulty levels. Even compared with GPT-3.5-turbo with 175B parameters, DISC-LawLLM showed superior performance in some tests. Table 2 shows the objective evaluation results, in which bold indicates the best result and underline indicates the second-best result.

9beccc7032eb30d3bf1ebb6386a96941.jpeg

Table 2: Objective evaluation results

Subjective evaluation results

In objective evaluations, DISC-LawLLM received the highest overall score and the highest scores in the two criteria of accuracy and clarity. Table 3 shows the subjective evaluation results, in which bold indicates the best results.

8c29bc3f5af5af91ae2527d28c6d6c4f.jpeg

Table 3: Subjective evaluation results

05

Summarize

We released DISC-LawLLM, an intelligent legal system that provides legal services in multiple application scenarios. Based on the public NLP task data set in the legal field, the original legal text and the open source general instruction data set, the legal instructions are reconstructed according to the legal syllogism for supervision and fine-tuning. In order to improve the reliability of the output, we added an external retrieval module. By improving legal reasoning and knowledge retrieval capabilities, DISC-LawLLM outperforms existing legal LLMs on the legal benchmark evaluation set we constructed. Research in this field will bring more prospects and possibilities to achieve legal resource balance, etc. We have released the constructed data set and model weights to promote further research.

END


Enter the NLP group—> Join the NLP exchange group

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/133398026