Machine learning scikit-learn library

Talked about earlier, the library for study, lightweight, so to learn it.

Installation is not talked about, is simple. But you must first install numpy and pandas libraries to install scikit-learn library.

If the anaconda was then installed, it comes with this library.

----------------------------------------------------------------------------------------------------------

1, first dictionary feature extraction

Effect: for the dictionary data feature value extraction.

API:sklearn.feature_extraction.DictVectorizer

 

 

Flow: 1, instantiating class DictVectorizer ()

  2, input data, and calls the conversion method fit_transorm

On the code:

. 1  from sklearn.feature_extraction Import DictVectorizer
 2  
. 3  DEF dictvec ():
 . 4      '' ' 
. 5      the dictionary data extraction
 . 6      : return: None
 . 7      ' '' 
. 8      # instantiated 
. 9      dict = DictVectorizer ()
 10  
. 11      # call fit_transorm 
12 is      Data = dict .fit_transform ([{ ' name ' : ' X- ' , ' Score ' : 80}, { ' name ' : ' the Y','score': 90},{'name':'Z','score': 100}])
13 
14     print(data)
15 
16     return None
17 
18 if __name__ == '__main__':
19     dictvec()

 

 

 Can see the output result is a Sparse matrix, which is in front of the parentheses to obtain the coordinates, the latter figure is the value of the coordinates, such as: (0,0) value of 1.0 indicates a row 0 column 0.

The other is not listed as a coordinate (0,1), (0,2), etc. The default value is 0 .

The sparse parameter DictVectorizer () is set to False so easily readable results.

 

 

 2, the text feature extraction

 

Effect: on the text data extracted 

API: sklearn.feature_extraction.text.CountVectorizer


on Code: Suppose there are two articles:
'Life IS shortm, I like the Python' and 'life is too long, i dislike Python'
 
. 1  from sklearn.feature_extraction.text Import CountVectorizer
 2  
. 3  DEF countvec ():
 . 4      '' ' 
. 5      the text feature value extraction
 . 6      : return: None
 . 7      ' '' 
. 8      # instantiated 
. 9      CV = CountVectorizer ()
 10  
. 11      # call fit_transorm 
12 is      Data cv.fit_transform = ([ ' Life IS shortm, I like the Python ' , ' Life IS TOO Long, I dislike the Python ' ])
 13 is  
14      Print (Data)
 15     
16     return None
17 
18 if __name__ == '__main__':
19     countvec()
 

 

 

 

Results and extraction dictionary is the same, it is worth noting that you want to parse this matrix is converted into a two-dimensional matrix is easier to read, then, is to call in the results toarray (), instead of setting the sparse parameters 
as shown below:

 

 get_feature_names () returns a list, which is a list of all of the feature extraction (in the present embodiment extracts eight words, single letters are not counted).

The results there are two lists, each corresponding to an article. The first list of the first 0 represents the first article dislike does not appear, the first list represents the first one is there, and so on

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/GouQ/p/11838829.html