1. The feature extraction: the text string, the dictionary data, is converted into a digital feature extraction.
2. The feature extraction API: sklearn.feature_extraction
3. dictionary feature extraction: for the dictionary data eigenvalue of using sklearn.feature_extraction.DictVectorizer
. DictVectorizer Syntax: DictVectorizer fit_transform (X-) : X-is or comprises dictionary dictionary iterator; the return value is sparse matrix
DictVectorizer.inverse_transform (X): X is sparse array or matrix array; return value is a data format before the conversion
DictVectorizer.get_feature_names (): Returns the name of the category
DictVectorizer.transform (X): According to the original standard conversion
Dictionary data extraction: the number of classes in the dictionary data, characteristic values were converted to
4.one-hot encoding: each category (each column can be considered as a value in the data table) to generate a Boolean columns, which may be only a value of 1 for each sample.
one-hot encoding: In short, in order to avoid the digital size of each column caused priority problems, facilitate data analysis and machine learning.
Dictionary feature extraction: [ { "City": "Beijing", "temperature": 100} ,
{"city":"上海","temperature":60},
{“city”:"深圳","temperature":30} ]
Import DictVectorizer sklearn.feature_extraction from
DEF dictvec ():
dict = DictVectorizer () # if sparse = False, to obtain a matrix
# call fit_transform
Data dict.fit_transform = ([{ "City": "Beijing", "temperature": 100} ,
{ "City": "Shanghai", "temperature": 60},
{ "City": "Shenzhen", "temperature": 30}])
Print (dict.get_feature_names ())
Print (Data)
return None
IF the __name__ == '__main__':
dictvec ()
Results: # adjacency matrix
(0, 1) 1.0
(0, 3) 100.0
(1, 0) 1.0
(1, 3) 60.0
(2, 2) 1.0
(2, 3) 30.0
# If the sparse = False, to obtain a matrix
[ 'city = Shanghai', 'city = Beijing', 'city = Shenzhen', 'temperature']
[[0. 0. 1. 100. The]
[1. 0. 0. 60. The]
[0. 0. 1 30.]]
NOTE: characterization is to better understand the computer data.
English: feature characteristics, extraction extraction, Vectorizer vector is, sparse sparse, fit for