Detailed scoring formula Lucene (TFIDFSimilarity)

Part I. Theoretical basis

一、Boolean Model

Two, TF / IDF

1、TF(Term frequency)

2、IDF(Inverse document frequency)

3、Field-length norm       

 

三、Vector Space Model

Part II. TFIDFSimilarity

The concept formula:

      

The actual formula:

 

Guess you like

Origin www.cnblogs.com/philo-x/p/11280313.html