Machine learning of the TF-IDF

TF-IDF is a statistical method to evaluate the importance of a term set for a file or a document in the corpus where the. The importance of words as the number of times it appears in the file is proportional to the increase, but at the same time as it would in Corpus fall inversely proportional to frequency appears.

TF refers: term frequency: word frequency  IDF refers to: inverse document frequency: inverse document frequency

TF fact, the number of times a word appears in the article. IDF is calculated as: log (the total number of articles / number of articles occurrences of the term)

Therefore, the TF-IDF formula for calculating the value of a word is: TF * IDF (This value reflects the importance of the word)

API:sklearn.feature_extraction.text.TfidfVectorizer

 

Guess you like

Origin www.cnblogs.com/GouQ/p/11867224.html