Previously stored python basics

 1. When performing import sys, python looking sys module file listed in the directory sys.path variable. Then run the main block of statements in the module is initialized, then the module can be used.

2.python internal use unicode is handled, but to consider the use of unicode its encoding format, there are two, one UCS-2, it is a total of 65,536 yards bits, the other is a UCS-4, it has 2147483648g code bits.
https://cloud.tencent.com/developer/article/1406492

3. Since 2007 release, scikit-learn Python has become an important repository of machine learning, scikit-learn abbreviation sklearn, including support for classification, regression, and dimension reduction four clusters of machine learning algorithms. Further comprising a feature extraction, data processing and model evaluation by three modules.

4. if __name__ == '__main__': Effect

A python files typically used in two ways, first as a script is executed directly, and the second is to import another python script is called (block reuse) performed. Thus if __name__ == 'main': role is to control the execution of code both cases, if __name__ == 'main': under the code only (i.e., as a script file direct execution) only in the first case It will be executed, and import to other scripts will not be executed.

5. There are two api achieved CountVectorizer and TfidfVectorizer

CountVectorizer:
only consider the frequency of words appearing in the text
: TfidfVectorizer
number of considerations in addition to the frequency of certain words in the text to appear, the term also includes concern for all text

Can reduce the impact of high-frequency words does not make sense appears, tap the more significant features 

6.
in Corpora is a fundamental concept in gensim, is the manifestation of a collection of documents , also the basis for subsequent further processing. In essence, corpora is actually a form or agreement, is actually a two-dimensional matrix

Published 39 original articles · won praise 1 · views 453

Guess you like

Origin blog.csdn.net/qq_40647378/article/details/103789676