The gods are silent-personal CSDN blog post directory
Full name of the paper: Judicial knowledge‑enhanced magnitude‑aware reasoning for numerical legal judgment prediction
Model abbreviation: NumLJP
task: numerical LJP (numerical legal judgment prediction), refers to the task related to the numerical value in LJP, in this paper it is to predict the sentence (in months as a unit) and a fine (as an ordinal classification task).
(This article calls the fines terms of penalty. I am really devastated. I am not sure if this paper is the only one that does this.)
SpringerLink paper link: https://link.springer.com/article/10.1007/s10506-022-09337-4
This article is a paper in Artificial Intelligence and Law 2022. The typesetting is quite difficult to explain in a word, I can only say that it is not impossible to read.
Mainly focus on numerical LJP tasks.
This paper believes that the previous LJP work did not pay attention to the numerical information in the cases, and it was impossible to know the comparability of the numerical values of different judgments (such as 400 < 500 < 800) (numerical comparison).
Therefore, this paper proposes the NumLJP framework to learn the numerical information in the text: first select the judgment knowledge (see Section 1 for details), and then predict the sentence and fine based on the judgment knowledge and case information.
①a judicial knowledge selection module: first use the judgment knowledge selector based on comparative learning to distinguish confusing case 1. Previous work only used legal articles as external knowledge, but this paper uses quantitative standards in real scenarios to anchor Determine the reference amount ( numerical anchors : reference numbers in judgment knowledge).
②a legal numerical commonsense acquisition: Design Masked Numeral Prediction (MNP) to let the model remember the anchor, so as to obtain the numerical common sense of law according to the selected judgment knowledge.
③a reasoning module: Build a scale-based numerical graph (consisting of anchors and numerical values in fact descriptions) to achieve magnitude-aware numerical reasoning.
This means learning representations of these numbers.
④judgment prediction module: Finally, use fact description, judgment knowledge and numbers to realize judicial decision-making.
Article Directory
1. Problem Definition
The numerical legal judgment prediction in this article predicts the sentence and fine, which are divided into several intervals, which is actually equivalent to doing ordinal classification tasks. The final indicator also uses the macro-F1 of the classification task.
This paper assumes that there is a functional relationship between the interval of the value in the case facts and the final judgment result:
In the previous work of the CAIL2018 dataset, the accuracy of predicting the sentence is significantly lower than that of the crime and the law. The author believes that this is because the previous work ignored the numerical information in the description of the crime and fact, and only regarded it as plain words or [UNK]. (For example, stealing 1939 yuan or 7300 yuan will lead to a big difference in the sentence. The 7000 yuan sentence and the fine amount between the two cases should also be in between. If the model does not understand the numbers, it will not be able to predict correctly. Sentence and fine amount numerical comparison ) Numerical reasoning work such as
NumNet 2 can realize the comparison relationship well.
But these works 1. ignore the type of crime corresponding to the value. For example, AB in the previous figure is theft, and C is robbery, so the specific amount cannot be directly compared. 2. Magnitude is ignored (for example, 7000 is closer to 7300, so the sentence should be closer to 7300's sentence (12 months) magnitude awareness ) 3. Lack of training data, too few numbers in the case (the solution is to introduce numerical anchors, limits the total numerical search space)
This article believes that judgment knowledge is more practical, detailed, and quantitative than legal articles:
(The green font is the numerical anchor point)
Judgment process:
(picture source another paper)
2. Model
Roberta constructs:
u ⃗ X , X ˉ = Roberta ( [ CLS ] ; X ) , \begin{aligned} \vec{u}^X,\bar{\mathbf {X}}={\text {RoBERTa}} (\mathrm{[CLS]};\,X),\end{aligned}uX,Xˉ=RoBERTa([CLS];X),
u ⃗ X \vec{u}^X uX is a [CLS] representation,X \mathbf {X}X is the representation matrix of all tokens
1. JKS (judicial knowledge selection module)
Contrastive learning classifier: select judgment knowledge based on criminal facts (one kind of knowledge corresponds to one kind of criminal behavior)
The numbers of the same criminal fact (referring to the same category) are used for comparative learning (reference 3 ):
L1: Cross entropy
L2: Supervised contrastive learning SCL (to make the representation similarity of similar samples close. It feels like a conventional contrastive learning loss function Ah, you can refer to this article: Contrastive learning (continuous update ing...) )
L J K S = ( 1 − λ ) L 1 + λ L 2 , L 1 = − 1 N ∑ i = 1 N ∑ m = 1 n A y i , m A ⋅ log y ^ i , m A , L 2 = ∑ i = 1 N − 1 N y i A − 1 ∑ j = 1 N 1 i ≠ j 1 y i A = y j A log exp ( u ⃗ i X ⋅ u ⃗ j X / τ ) ∑ k = 1 N 1 i ≠ k exp ( u ⃗ i X ⋅ u ⃗ k X / τ ) , \begin{aligned} \begin{aligned} \mathcal {L}_{\mathrm {JKS}}&=(1-\lambda ) \mathcal {L}_{1}+\lambda \mathcal {L}_{2}, \\ \mathcal {L}_{1}&=-\frac{1}{N} \sum _{i=1}^{N} \sum _{m=1}^{n^{\mathcal {A}}} y_{i,m}^{\mathcal {A}} \cdot \log \hat{y}_{i,m}^{\mathcal {A}}, \\ \mathcal {L}_{2}&=\sum _{i=1}^{N}-\frac{1}{N_{y_{i}^{\mathcal {A}}}-1} \sum _{j=1}^{N} \mathbf {1}_{i \ne j} \mathbf {1}_{y_{i}^{\mathcal {A}}=y_{j}^{\mathcal {A}}}\\&\quad \log \frac{\exp \left( \vec{u}_i^X \cdot \vec{u}_j^X / \tau \right) }{\sum _{k=1}^{N} \mathbf {1}_{i \ne k} \exp \left( \vec{u}_i^X \cdot \vec{u}_k^X / \tau \right) }, \end{aligned} \end{aligned} LJKSL1L2=(1−l ) L1+λL2,=−N1i=1∑Nm=1∑nAyi,mA⋅logy^i,mA,=i=1∑N−NyiA−11j=1∑N1i=j1yiA=yjAlog∑k=1N1i=kexp(uiX⋅ukX/ t )exp(uiX⋅ujX/ t ).,
Identify fine-grained numeric types
2. MNP (legal numerical commonsense acquisition)
Do Masked Numeral Prediction (MNP) on judgment knowledge: acquire legal numerical common sense in judgment knowledge
Use categorical paradigms to predict (vocabularies are all numerical anchors)
L M N P = − 1 N ∑ i = 1 N ∑ j = 1 n A ∑ k = 1 n V y i , j k ⋅ log y ^ i , j k , \begin{aligned} \mathcal {L}_{\mathrm {MNP}}&=-\frac{1}{N} \sum _{i=1}^{N} \sum _{j=1}^{n^A} \sum _{k=1}^{n^{V}} y_{i,j}^k \cdot \log \hat{y}_{i,j}^k, \end{aligned} LMNP=−N1i=1∑Nj=1∑nAk=1∑nVyi,jk⋅logy^i,jk,
3. MagNet (reasoning module)
scale-based numerical graph
heterogeneous directed graph nodes are factual descriptions and numerical values
in judgment knowledge edges are comparison and magnitude relationships: greater than/less than (REL)+MAG
Edges between 72 and 100:
The calculation of this MAG is quite complicated. I didn't understand it very well, so I just copied it. If anyone knows the principle, please tell me what it is:
- Divide by a specific scale. Design mutiplier, characterize mutiplier
- MinDiff
- s c a l e t = MinDiff ( v i A , v j A ) N t scale^t=\frac{
{\text {MinDiff}}(v_i^A, v_j^A)}{N^t} scalet=NtMinDiff(viA,vjA)
N t N^t Nt要满足: ⌈ ∗ ⌉ m t s c a l e t ≤ f m a x , \begin{aligned} \lceil *\rceil {\frac{m^t}{scale^t}}\le f_{max}, \end{aligned} ⌈∗⌉scaletmt≤fmax,(Size is related to the trade-off between accuracy/recall) - 计算multiplicative factor: f = ⌈ ∗ ⌉ ∣ n ( v i ) − n ( v j ) ∣ s c a l e t f=\lceil *\rceil {\frac{\mid n(v_i)-n(v_j)\mid }{scale^t}} f=⌈∗⌉scalet∣n(vi)−n(vj)∣
f ∈ { 1 , . . . , N f } f\in \{1,...,N^f\} f∈{ 1,...,Nf}( N f N^f NF can be adjusted to 100)
MagNet (Magnitude-aware numerical reasoning Network): Representation value (I didn’t understand the specific introduction, I won’t write it, it probably means using a GAT?)
MX = WMX ˉ , MA = WMA ˉ , U = MagNet ( G ; MX , MA , u ⃗ X , u ⃗ A ) , \begin{aligned} \begin{aligned} \mathbf {M}^X&= \mathbf { . W}^M\bar{\mathbf {X}},\\ \mathbf {M}^A&= \mathbf {W}^M\bar{\mathbf {A}},\\\mathbf {U}&= {\text {MagNet}}(\mathcal {G};\mathbf {M}^X,\mathbf {M}^A,\vec{u}^X,\vec{u}^A), \end{ aligned} \end{aligned}MXMAU=WMXˉ,=WMAˉ,=MagNet(G;MX,MA,uX,uA),
Combining the numerical representations in the fact description and judgment knowledge, and performing linear transformation to obtain a magnitude-aware semantic representation:
M num = U [ IX , IA ] , MO = WO [ M num ; [ MX ; MA ] ] , \begin{ aligned} \begin{aligned} \mathbf {M}^{num}&= \mathbf {U}[\mathbf {I}^X,\mathbf {I}^A],\\ \mathbf {M}^{ O}&= \mathbf {W}^{O}[\mathbf {M}^{num};[\mathbf {M}^X;\mathbf {M}^A]], \end{aligned} \end {aligned}MnumMO=U[IX,IA],=WO[Mnum;[MX;MA]],
4. judgment prediction module
Judicial decision-making with factual descriptions, sentencing knowledge, and figures (sentences are finer than LADAN 4 's classification).
Cross-entropy used in previous work:
LP = − 1 N ∑ i = 1 N ∑ j = 1 n P yi , j P ⋅ log y ^ i , j P , \begin{aligned} \mathcal {L}^P&= -\frac{1}{N} \sum _{i=1}^{N} \sum _{j=1}^{n^{P}} y_{i,j}^{P} \cdot \ log \hat{y}_{i,j}^{P}, \end{aligned}LP=−N1i=1∑Nj=1∑nPyi,jP⋅logy^i,jP,
再提出一个损失函数:
L I = − 1 N ∑ i = 1 N { y i ℓ ⋅ log y ^ i ℓ ⏞ life imprisonment + y i d ⋅ log y ^ i d ⏞ death + ∑ k = 0 300 y i , k I ⋅ log y ^ i , k I [ log ( v i , k I ) − log ( v ^ i , k I ) ] 2 ⏟ less than 25 years (300 months) } , \begin{aligned} \mathcal {L}^I =&-\frac{1}{N}\sum _{i=1}^{N}\{\overbrace{y_{i}^{\ell }\cdot \log \hat{y}_{i}^{\ell }}^{\text {life imprisonment}}+\overbrace{y_{i}^{d}\cdot \log \hat{y}_{i}^{d}}^{\text {death}}\nonumber \\&+\underbrace{\sum _{k=0}^{300}y_{i,k}^I\cdot \log \hat{y}_{i,k}^{I}[\log (v_{i,k}^I)-\log (\hat{v}_{i,k}^{I})]^2}_{\text {less than 25 years (300 months)}}\text { }\}, \end{aligned} LI=−N1i=1∑N{
yiℓ⋅logy^iℓ
life imprisonment+yid⋅logy^id
death+less than 25 years (300 months)
k=0∑300yi,kI⋅logy^i,kI[log(vi,kI)−log(v^i,kI)]2 },
(v is magnitude, strange. It is also understandable)
总的 loss function:
L total = γ LJKS + ( 1 − γ ) LMNP + LI + LP . \begin{aligned} \mathcal {L}_{total}&= \gamma \mathcal {L}_{\mathrm {JKS}}+(1-\gamma )\mathcal {L}_{\mathrm {MNP} }+\mathcal {L}^{I}+\mathcal {L}^{P}. \end{aligned}Ltotal=γLJKS+(1−c ) LMNP+LI+LP.
3. Experiment
3.1 Dataset
CAIL2018 5 : Sentences and Fines
- CAIL-small
- CAIL-large
AIJudge 6 : Penalties
Examples of numerical anchors:
Statistical graph of numerical graph nodes and edges:
The data preprocessing part is to be supplemented.
3.2 Indicators
Metrics for classification tasks: accuracy (Acc.), macro-precision (MP), macro-recall (MR) and macro-F1 (F1)
ImpScore(interpretation)
h = ∣ log ( I p + 1 ) − log ( I g ) + 1 ∣ , ImpScore = { 1 , h ≤ 0.2 , 0.8 , 0.2 < h ≤ 0.4 , 0.6 , 0.4 < h ≤ 0.6 , 0.4 , 0.6 < h ≤ 0.8 , 0.2 , 0.8 < h ≤ 1 , 0 , other \begin{aligned} h= & {} \mid \log(I_p+1)-\log(I_g)+1\mid ,\nonumber \\ \text { {\textbf {ImpScore}}}= & {}
\ left\{\begin{array}{rcl}1, &{}&{}{h\le 0.2,}\\ 0.8, &{}&{}{0.2<h\le 0.4,}\\ 0.6, & {}&{} {0.4<h\le 0.6,}\\ 0.4, &{}&{} {0.6<h\le 0.8,}\\ 0.2, &{}&{} {0.8<h\le ,}\\ 0, &{}&{}{other.}\end{array}\right. \end{aligned}h=ImpScore=∣log(Ip+1)−log(Ig)+1∣,⎩
⎨
⎧1,0.8,0.6,0.4,0.2,0,h≤0.2,0.2<h≤0.4,0.4<h≤0.6,0.6<h≤0.8,0.8<h≤1,other.
3.3 baseline
- TOPJUDGE
- MPBFN
- CPTP
- NeurJudge7
- NumNet 2 → replace encoder with RoBERTa and continue pre-training on legal text
3.4 Experimental setup
To be filled.
A gradient clipping trick is used here, which may be referenced in the task of combining GNN+NLP. But I don't use it at the moment, so I just remember it.
3.5 Results of the main experiment
Penalty Forecast:
Sentence forecast:
3.6 Experimental analysis
To be filled.
When it comes to confusing cases, everyone should have the first reaction to think of the sensitivity of LADAN 4 and NeurJudge 7 . ↩︎
(2019 EMNLP) NumNet: Machine Reading Comprehension with Numerical Reasoning ↩︎ ↩︎
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning ↩︎
Re27: Read the paper LADAN Distinguish Confusing Law Articles for Legal Judgment Prediction ↩︎ ↩︎
(2021 SIGIR) Re38:读论文 NeurJudge: A Circumstance-aware Neural Framework for Legal Judgment Prediction ↩︎ ↩︎