The gods are silent-personal CSDN blog post directory
Last update date: 2023.6.15
Earliest update date: 2023.6.7
Article Directory
1. General large-scale pre-trained language model
English:
- LegalBERT
- Original paper: (2020 EMNLP) LEGAL-BERT: The Muppets straight out of Law School - ACL Anthology
- Download link: huggingface
- CaseLaw-BERT
- BERT Law
- Original paper: (2021) Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain
- Download link: https://huggingface.co/nguyenthanhasia/BERTLaw
- PolBERT
- legal-longformer
- Download link: https://huggingface.co/saibo/legal-longformer-base-4096
- LegalLAMA
- (India) InLegalBERT
- 原始论文:(2023 ICAIL) Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law
- Download link: https://huggingface.co/law-ai/InLegalBERT
Chinese:
- Lawformer
- Original paper: (2021) Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents
- Download method: thunlp/LegalPLMs: Source code and checkpoints for legal pre-trained language models.
Italian:
- ITALIAN-LEGAL-BERT
Romanian:
- jurBERT
- Original paper: (2021 NLLP) jurBERT: A Romanian BERT Model for Legal Judgment Prediction
Spanish:
- RoBERTalex
- Original paper: (2021) Spanish Legalese Language Model and Corpora
- Download address: PlanTL-GOB-ES/RoBERTalex · Hugging Face
multi-language:
- ParaLaw Nets (the paper should be in Japanese and English)
- Original paper: (2021 COLIEE) ParaLaw Nets – Cross-lingual Sentence-level Pretraining for Legal Text Processing
- Download address: I guess this is: nguyenthanhasia/XLM-Paralaw · Hugging Face
- LegalXLMs
- Original paper: (2023) MultiLegalPile: A 689GB Multilingual Legal Corpus
- Download address: too many, to be added
Vietnamese:
- nguyenthanhasia/VNBertLaw · Hugging Face
- PhoBERT
- Original paper: (2020 EMNLP) PhoBERT: Pre-trained language models for Vietnamese
- Official GitHub project (introduces the address and download method of each pre-trained model checkpoint): VinAIResearch/PhoBERT: PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
French
- JuriBERT
- Original paper: (2022) JuriBERT: A Masked-Language Model Adaptation for French Legal Text
- Download address: http://master2-bigdata.polytechnique.fr/resources#juribert (using transformers package)
2. Dialogue Model
Chinese:
- Lawyer CALL
AndrewZhe/lawyer-call: Recent new CALL- Original paper: (2023) Lawyer LLaMA Technical Report
- Official GitHub project: AndrewZhe/lawyer-llama: The online experience of the Chinese legal LLaMA web version can directly apply for access rights (only 100 access rights are given, and it is said that it will be dynamically adjusted later, which probably means that if you have money, you can give more) Local deployment Version: lawyer-llama-13b-beta1.0 is public ( lawyer-llama/run_inference.md at main AndrewZhe/lawyer-llama GitHub ), but the weight of LLaMA is necessary, and I am still in the LLaMA team, so Wait
English:
- Although the name of LawGPT 1.0
is very orthodox and domineering, in fact, it didn’t give anything, and it felt like a dick without pictures.
3. Clause
multi-language:
- https://huggingface.co/models?search=rcds/distilbert-sbd (English, Spanish, German, Italian, Portuguese, French)
- Original paper: (2023 ICAIL) MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset
4. Text Classification
multi-language:
- PyEuroVoc (languages of EU member states and candidate member states) is classified according to EuroVoc indicator. Based on BERT
- Original paper: (2021 RANLP) PyEuroVoc: A Tool for Multilingual Legal Document Classification with EuroVoc Descriptors
- Download address: https://pypi.org/project/pyeurovoc/
5. Information extraction
- The original model of FPDM
is the work of migrating from open-domain to specific domain. The main work in the legal field is contract review (extracting important information)- Original paper: (2023) FPDM: Domain-Specific Fast Pre-training Technique using Document-Level Metadata
- Given the code and dataset: https://drive.google.com/drive/folders/1RT7g_cTR_twz75xmFjDgQmCPWC8sZSFK