TF-IDF is a statistical method to evaluate the importance of a term set for a file or a document in the corpus where the. The importance of words as the number of times it appears in the file is proportional to the increase, but at the same time as it would in Corpus fall inversely proportional to frequency appears.
TF refers: term frequency: word frequency IDF refers to: inverse document frequency: inverse document frequency
TF fact, the number of times a word appears in the article. IDF is calculated as: log (the total number of articles / number of articles occurrences of the term)
Therefore, the TF-IDF formula for calculating the value of a word is: TF * IDF (This value reflects the importance of the word)
API:sklearn.feature_extraction.text.TfidfVectorizer