Re41:读论文 NumLJP Judicial knowledge‑enhanced magnitude‑aware reasoning for numerical legal judgment p

The gods are silent-personal CSDN blog post directory

Full name of the paper: Judicial knowledge‑enhanced magnitude‑aware reasoning for numerical legal judgment prediction
Model abbreviation: NumLJP
task: numerical LJP (numerical legal judgment prediction), refers to the task related to the numerical value in LJP, in this paper it is to predict the sentence (in months as a unit) and a fine (as an ordinal classification task).

(This article calls the fines terms of penalty. I am really devastated. I am not sure if this paper is the only one that does this.)

insert image description here

SpringerLink paper link: https://link.springer.com/article/10.1007/s10506-022-09337-4

This article is a paper in Artificial Intelligence and Law 2022. The typesetting is quite difficult to explain in a word, I can only say that it is not impossible to read.
Mainly focus on numerical LJP tasks.
This paper believes that the previous LJP work did not pay attention to the numerical information in the cases, and it was impossible to know the comparability of the numerical values ​​of different judgments (such as 400 < 500 < 800) (numerical comparison).
Therefore, this paper proposes the NumLJP framework to learn the numerical information in the text: first select the judgment knowledge (see Section 1 for details), and then predict the sentence and fine based on the judgment knowledge and case information.

①a judicial knowledge selection module: first use the judgment knowledge selector based on comparative learning to distinguish confusing case 1. Previous work only used legal articles as external knowledge, but this paper uses quantitative standards in real scenarios to anchor Determine the reference amount ( numerical anchors : reference numbers in judgment knowledge).
②a legal numerical commonsense acquisition: Design Masked Numeral Prediction (MNP) to let the model remember the anchor, so as to obtain the numerical common sense of law according to the selected judgment knowledge.
③a reasoning module: Build a scale-based numerical graph (consisting of anchors and numerical values ​​in fact descriptions) to achieve magnitude-aware numerical reasoning.
This means learning representations of these numbers.
④judgment prediction module: Finally, use fact description, judgment knowledge and numbers to realize judicial decision-making.

1. Problem Definition

The numerical legal judgment prediction in this article predicts the sentence and fine, which are divided into several intervals, which is actually equivalent to doing ordinal classification tasks. The final indicator also uses the macro-F1 of the classification task.

This paper assumes that there is a functional relationship between the interval of the value in the case facts and the final judgment result:
insert image description here

In the previous work of the CAIL2018 dataset, the accuracy of predicting the sentence is significantly lower than that of the crime and the law. The author believes that this is because the previous work ignored the numerical information in the description of the crime and fact, and only regarded it as plain words or [UNK]. (For example, stealing 1939 yuan or 7300 yuan will lead to a big difference in the sentence. The 7000 yuan sentence and the fine amount between the two cases should also be in between. If the model does not understand the numbers, it will not be able to predict correctly. Sentence and fine amount numerical comparison ) Numerical reasoning work such as
NumNet 2 can realize the comparison relationship well.
But these works 1. ignore the type of crime corresponding to the value. For example, AB in the previous figure is theft, and C is robbery, so the specific amount cannot be directly compared. 2. Magnitude is ignored (for example, 7000 is closer to 7300, so the sentence should be closer to 7300's sentence (12 months) magnitude awareness ) 3. Lack of training data, too few numbers in the case (the solution is to introduce numerical anchors, limits the total numerical search space)

This article believes that judgment knowledge is more practical, detailed, and quantitative than legal articles:
(The green font is the numerical anchor point)
insert image description here

Judgment process:
insert image description here
(picture source another paper)

2. Model

insert image description here

Roberta constructs:
u ⃗ X , X ˉ = Roberta ( [ CLS ] ; X ) , \begin{aligned} \vec{u}^X,\bar{\mathbf {X}}={\text {RoBERTa}} (\mathrm{[CLS]};\,X),\end{aligned}u X,Xˉ=RoBERTa([CLS];X),
u ⃗ X \vec{u}^X u X is a [CLS] representation,X \mathbf {X}X is the representation matrix of all tokens

1. JKS (judicial knowledge selection module)

Contrastive learning classifier: select judgment knowledge based on criminal facts (one kind of knowledge corresponds to one kind of criminal behavior)

The numbers of the same criminal fact (referring to the same category) are used for comparative learning (reference 3 ):
L1: Cross entropy
L2: Supervised contrastive learning SCL (to make the representation similarity of similar samples close. It feels like a conventional contrastive learning loss function Ah, you can refer to this article: Contrastive learning (continuous update ing...) )
L J K S = ( 1 − λ ) L 1 + λ L 2 , L 1 = − 1 N ∑ i = 1 N ∑ m = 1 n A y i , m A ⋅ log ⁡ y ^ i , m A , L 2 = ∑ i = 1 N − 1 N y i A − 1 ∑ j = 1 N 1 i ≠ j 1 y i A = y j A log ⁡ exp ⁡ ( u ⃗ i X ⋅ u ⃗ j X / τ ) ∑ k = 1 N 1 i ≠ k exp ⁡ ( u ⃗ i X ⋅ u ⃗ k X / τ ) , \begin{aligned} \begin{aligned} \mathcal {L}_{\mathrm {JKS}}&=(1-\lambda ) \mathcal {L}_{1}+\lambda \mathcal {L}_{2}, \\ \mathcal {L}_{1}&=-\frac{1}{N} \sum _{i=1}^{N} \sum _{m=1}^{n^{\mathcal {A}}} y_{i,m}^{\mathcal {A}} \cdot \log \hat{y}_{i,m}^{\mathcal {A}}, \\ \mathcal {L}_{2}&=\sum _{i=1}^{N}-\frac{1}{N_{y_{i}^{\mathcal {A}}}-1} \sum _{j=1}^{N} \mathbf {1}_{i \ne j} \mathbf {1}_{y_{i}^{\mathcal {A}}=y_{j}^{\mathcal {A}}}\\&\quad \log \frac{\exp \left( \vec{u}_i^X \cdot \vec{u}_j^X / \tau \right) }{\sum _{k=1}^{N} \mathbf {1}_{i \ne k} \exp \left( \vec{u}_i^X \cdot \vec{u}_k^X / \tau \right) }, \end{aligned} \end{aligned} LJKSL1L2=(1l ) L1+λL2,=N1i=1Nm=1nAyi,mAlogy^i,mA,=i=1NNyiA11j=1N1i=j1yiA=yjAlogk=1N1i=kexp(u iXu kX/ t )exp(u iXu jX/ t ).,

Identify fine-grained numeric types

2. MNP (legal numerical commonsense acquisition)

Do Masked Numeral Prediction (MNP) on judgment knowledge: acquire legal numerical common sense in judgment knowledge

insert image description here

Use categorical paradigms to predict (vocabularies are all numerical anchors)

L M N P = − 1 N ∑ i = 1 N ∑ j = 1 n A ∑ k = 1 n V y i , j k ⋅ log ⁡ y ^ i , j k , \begin{aligned} \mathcal {L}_{\mathrm {MNP}}&=-\frac{1}{N} \sum _{i=1}^{N} \sum _{j=1}^{n^A} \sum _{k=1}^{n^{V}} y_{i,j}^k \cdot \log \hat{y}_{i,j}^k, \end{aligned} LMNP=N1i=1Nj=1nAk=1nVyi,jklogy^i,jk,

3. MagNet (reasoning module)

scale-based numerical graph
heterogeneous directed graph nodes are factual descriptions and numerical values
​​in judgment knowledge edges are comparison and magnitude relationships: greater than/less than (REL)+MAG

Edges between 72 and 100:
insert image description here

The calculation of this MAG is quite complicated. I didn't understand it very well, so I just copied it. If anyone knows the principle, please tell me what it is:

  1. Divide by a specific scale. Design mutiplier, characterize mutiplier
    insert image description here
  2. MinDiff
  3. s c a l e t = MinDiff ( v i A , v j A ) N t scale^t=\frac{ {\text {MinDiff}}(v_i^A, v_j^A)}{N^t} scalet=NtMinDiff(viA,vjA)
    N t N^t Nt要满足: ⌈ ∗ ⌉ m t s c a l e t ≤ f m a x , \begin{aligned} \lceil *\rceil {\frac{m^t}{scale^t}}\le f_{max}, \end{aligned} scaletmtfmax,(Size is related to the trade-off between accuracy/recall)
  4. 计算multiplicative factor: f = ⌈ ∗ ⌉ ∣ n ( v i ) − n ( v j ) ∣ s c a l e t f=\lceil *\rceil {\frac{\mid n(v_i)-n(v_j)\mid }{scale^t}} f=scaletn(vi)n(vj)
    f ∈ { 1 , . . . , N f } f\in \{1,...,N^f\} f{ 1,...,Nf} N f N^f NF can be adjusted to 100)
  5. insert image description here

MagNet (Magnitude-aware numerical reasoning Network): Representation value (I didn’t understand the specific introduction, I won’t write it, it probably means using a GAT?)

MX = WMX ˉ , MA = WMA ˉ , U = MagNet ( G ; MX , MA , u ⃗ X , u ⃗ A ) , \begin{aligned} \begin{aligned} \mathbf {M}^X&= \mathbf { . W}^M\bar{\mathbf {X}},\\ \mathbf {M}^A&= \mathbf {W}^M\bar{\mathbf {A}},\\\mathbf {U}&= {\text {MagNet}}(\mathcal {G};\mathbf {M}^X,\mathbf {M}^A,\vec{u}^X,\vec{u}^A), \end{ aligned} \end{aligned}MXMAU=WMXˉ,=WMAˉ,=MagNet(G;MX,MA,u X,u A),

Combining the numerical representations in the fact description and judgment knowledge, and performing linear transformation to obtain a magnitude-aware semantic representation:
M num = U [ IX , IA ] , MO = WO [ M num ; [ MX ; MA ] ] , \begin{ aligned} \begin{aligned} \mathbf {M}^{num}&= \mathbf {U}[\mathbf {I}^X,\mathbf {I}^A],\\ \mathbf {M}^{ O}&= \mathbf {W}^{O}[\mathbf {M}^{num};[\mathbf {M}^X;\mathbf {M}^A]], \end{aligned} \end {aligned}MnumMO=U[IX,IA],=WO[Mnum;[MX;MA]],

4. judgment prediction module

Judicial decision-making with factual descriptions, sentencing knowledge, and figures (sentences are finer than LADAN 4 's classification).

Cross-entropy used in previous work:
LP = − 1 N ∑ i = 1 N ∑ j = 1 n P yi , j P ⋅ log ⁡ y ^ i , j P , \begin{aligned} \mathcal {L}^P&= -\frac{1}{N} \sum _{i=1}^{N} \sum _{j=1}^{n^{P}} y_{i,j}^{P} \cdot \ log \hat{y}_{i,j}^{P}, \end{aligned}LP=N1i=1Nj=1nPyi,jPlogy^i,jP,

再提出一个损失函数:
L I = − 1 N ∑ i = 1 N { y i ℓ ⋅ log ⁡ y ^ i ℓ ⏞ life imprisonment + y i d ⋅ log ⁡ y ^ i d ⏞ death + ∑ k = 0 300 y i , k I ⋅ log ⁡ y ^ i , k I [ log ⁡ ( v i , k I ) − log ⁡ ( v ^ i , k I ) ] 2 ⏟ less than 25 years (300 months)   } , \begin{aligned} \mathcal {L}^I =&-\frac{1}{N}\sum _{i=1}^{N}\{\overbrace{y_{i}^{\ell }\cdot \log \hat{y}_{i}^{\ell }}^{\text {life imprisonment}}+\overbrace{y_{i}^{d}\cdot \log \hat{y}_{i}^{d}}^{\text {death}}\nonumber \\&+\underbrace{\sum _{k=0}^{300}y_{i,k}^I\cdot \log \hat{y}_{i,k}^{I}[\log (v_{i,k}^I)-\log (\hat{v}_{i,k}^{I})]^2}_{\text {less than 25 years (300 months)}}\text { }\}, \end{aligned} LI=N1i=1N{ yilogy^i life imprisonment+yidlogy^id death+less than 25 years (300 months) k=0300yi,kIlogy^i,kI[log(vi,kI)log(v^i,kI)]2 },
(v is magnitude, strange. It is also understandable)

总的 loss function:
L total = γ LJKS + ( 1 − γ ) LMNP + LI + LP . \begin{aligned} \mathcal {L}_{total}&= \gamma \mathcal {L}_{\mathrm {JKS}}+(1-\gamma )\mathcal {L}_{\mathrm {MNP} }+\mathcal {L}^{I}+\mathcal {L}^{P}. \end{aligned}Ltotal=γLJKS+(1c ) LMNP+LI+LP.

3. Experiment

3.1 Dataset

CAIL2018 5 : Sentences and Fines

  • CAIL-small
  • CAIL-large

AIJudge 6 : Penalties

Examples of numerical anchors:
insert image description here

Statistical graph of numerical graph nodes and edges:
insert image description here

insert image description here

The data preprocessing part is to be supplemented.

3.2 Indicators

Metrics for classification tasks: accuracy (Acc.), macro-precision (MP), macro-recall (MR) and macro-F1 (F1)

ImpScore(interpretation)
h = ∣ log ⁡ ( I ​​p + 1 ) − log ⁡ ( I ​​g ) + 1 ∣ , ImpScore = { 1 , h ≤ 0.2 , 0.8 , 0.2 < h ≤ 0.4 , 0.6 , 0.4 < h ≤ 0.6 , 0.4 , 0.6 < h ≤ 0.8 , 0.2 , 0.8 < h ≤ 1 , 0 , other \begin{aligned} h= & {} \mid \log(I_p+1)-\log(I_g)+1\mid ,\nonumber \\ \text { {\textbf {ImpScore}}}= & {} \ left\{\begin{array}{rcl}1, &{}&{}{h\le 0.2,}\\ 0.8, &{}&{}{0.2<h\le 0.4,}\\ 0.6, & {}&{} {0.4<h\le 0.6,}\\ 0.4, &{}&{} {0.6<h\le 0.8,}\\ 0.2, &{}&{} {0.8<h\le ,}\\ 0, &{}&{}{other.}\end{array}\right. \end{aligned}h=ImpScore=log(Ip+1)log(Ig)+1, 1,0.8,0.6,0.4,0.2,0,h0.2,0.2<h0.4,0.4<h0.6,0.6<h0.8,0.8<h1,other.

3.3 baseline

  1. TOPJUDGE
  2. MPBFN
  3. CPTP
  4. NeurJudge7
  5. NumNet 2 → replace encoder with RoBERTa and continue pre-training on legal text

3.4 Experimental setup

To be filled.

A gradient clipping trick is used here, which may be referenced in the task of combining GNN+NLP. But I don't use it at the moment, so I just remember it.

3.5 Results of the main experiment

Penalty Forecast:
insert image description here

Sentence forecast:
insert image description here

3.6 Experimental analysis

To be filled.


  1. When it comes to confusing cases, everyone should have the first reaction to think of the sensitivity of LADAN 4 and NeurJudge 7 . ↩︎

  2. (2019 EMNLP) NumNet: Machine Reading Comprehension with Numerical Reasoning ↩︎ ↩︎

  3. Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning ↩︎

  4. Re27: Read the paper LADAN Distinguish Confusing Law Articles for Legal Judgment Prediction ↩︎ ↩︎

  5. https://github.com/thunlp/CAIL ↩︎

  6. https://www.datafountain.cn/competitions/277 ↩︎

  7. (2021 SIGIR) Re38:读论文 NeurJudge: A Circumstance-aware Neural Framework for Legal Judgment Prediction ↩︎ ↩︎

Guess you like

Origin blog.csdn.net/PolarisRisingWar/article/details/131420142