ChatLaw, open source!

The public account follows "GitHubDaily"

Set it as a "star" and take you to GitHub every day!

be40bb5dfd2f7e9dc7e7c58c9d37b089.png

In the recent period, the overall popularity of AI has declined, but the pace of exploration of AI technology in various industries has not stopped.

When ChatGPT was first released, many people in the industry believed that the professionalism and rigor of AI, and it might be a good choice to be used as a smart consultant for some specific industries.

The legal industry, among them, was first mentioned as one of the industries most likely to be disrupted by AI.

However, after many people tried it, they found that there are still many problems to be solved in the application of AI in the legal industry.

The most serious of these is that when AI actually answers questions, Hallucination (illusion) often occurs, and then provides some fabricated answers.

For legal application scenarios that require various rigorous data support and reasonable and sufficient arguments, whether it is GPT-3.5 or GPT-4, there are still many problems to be solved.

A few days ago, a friend from the Peking University team found me and said that they had open sourced a large language model focused on the legal industry on GitHub: ChatLaw .

After various evaluation tests, it is found that its actual effect in the legal industry is better than that of the existing large models.

Today, I would like to introduce this project to you solemnly.

ChatLaw is a large legal model trained based on various Chinese legal provisions, actual cases, and judgment provisions. With the help of AI, it can realize scenarios such as legal contract writing, case introduction, clause explanation, and judicial consultation.

8fe08d0f024c5baf6f3cca0a8bf60a65.png

  • GitHub:https://github.com/PKU-YuanGroup/ChatLaw

  • Online use: https://chatlaw.cloud/lawchat/

Developers can use this large model to quickly build personal legal advisors and exclusive smart lawyers to help you better resolve various legal disputes you encounter in your daily work.

The model mainly has 3 series (ChatLaw-13B, ChatLaw-33B, ChatLaw-Text2Vec), which are suitable for many different scenarios.

According to the magnitude of parameters, ChatLaw can be divided into 13B and 33B versions, both of which are academic demo versions, corresponding to 13 billion and 33 billion training parameters respectively.

ChatLaw-13B is trained based on Jiang Ziya's Ziya-LLaMA-13B-v1 model, and the Chinese data is relatively rich, so it performs better in the Chinese dialogue scene, but the disadvantage is that the training parameters are insufficient, and sometimes some more complicated legal issues, Answers were of low quality.

ChatLaw-33B is trained based on another Chinese model, Anima-33B. Because of the larger parameters, the logical reasoning ability has increased significantly. However, there are still too few Chinese corpus, so some English data occasionally appear when answering.

For most users, the more interactive scenarios of the legal model are mainly around legal consultation.

In order to enable AI to better understand and respond to legal questions raised by users, the Peking University team used a data set of 930,000 real judgment cases and trained a similarity matching model based on BERT: ChatLaw-Text2Vec, allowing artificial intelligence to automatically match user questions and legal provisions.

The user asked: "What should I do if the loan is not repaid?"

AI replied: "Contract Law (1999-03-15): Article 206 The borrower shall repay the loan within the agreed time limit. If there is no agreement on the loan period or the agreement is not clear, the provisions of Article 61 of this Law shall be followed. If it is still uncertain, the borrower can return it at any time; the lender can urge the borrower to return it within a reasonable period."

The results show that the similarity between the text content of the AI's answer and the training data is calculated to be 0.9960. In this way, the "illusion" problem in the large language model can be greatly reduced and the quality of the answer can be improved.

In the model evaluation and testing session, the ChatLaw team also did something special.

They introduced the ELO mechanism of the League of Legends, sorted out a total of 2,000 questions based on the judicial examination questions of the past ten years, let the AI ​​​​model play the model qualifying competition, and scored, and finally found that the final score and winning rate of ChatLaw are all high. Quite good.

14fa85056ff7ac024b70a6c1e701548e.png

In the future, if the large language model is to be truly usable, improving logical reasoning and reducing model illusions are two core issues that need to be solved urgently. This is also the main research direction of the ChatLaw team in the next step.

In the next few months, developers will improve the model parameters and optimize the vector database, so that the research on these two issues will have a breakthrough. You can wait and see.

Since the AIGC is quite popular this year, we have also established an AI community to explore more prospects and applications in the field of artificial intelligence.

If you want to learn more about practical AI technologies and applications, as well as new developments in ChatGPT, you can click the link below to join our community for further discussions.

Community entrance: ChatGPT community, officially launched!

Don't want to miss the article push? Click the official account card below to add a star to the GitHubDaily official account!

3e500124400b9d9ba75c1fc324402f60.png

Guess you like

Origin blog.csdn.net/sinat_33224091/article/details/131587748