04 _ the definition of project characteristics

1. The feature extraction: the text string, the dictionary data, is converted into a digital feature extraction.

2. The feature extraction API: sklearn.feature_extraction

3. dictionary feature extraction: for the dictionary data eigenvalue of using sklearn.feature_extraction.DictVectorizer

　. DictVectorizer Syntax: DictVectorizer fit_transform (X-) : X-is or comprises dictionary dictionary iterator;　　the return value is sparse matrix

　　　　　　　　　　 DictVectorizer.inverse_transform (X): X is sparse array or matrix array; return value is a data format before the conversion

　　　　　　　　　　DictVectorizer.get_feature_names (): Returns the name of the category

　　　　　　　　　　DictVectorizer.transform (X): According to the original standard conversion

　Dictionary data extraction: the number of classes in the dictionary data, characteristic values were converted to

4.one-hot encoding: each category (each column can be considered as a value in the data table) to generate a Boolean columns, which may be only a value of 1 for each sample.

one-hot encoding: In short, in order to avoid the digital size of each column caused priority problems, facilitate data analysis and machine learning.

　Dictionary feature extraction: [　{ "City": "Beijing", "temperature": 100} ,

　　　　　　　　{"city":"上海","temperature":60}，

　　　　　　　　{“city”:"深圳","temperature":30}　 ]

Import DictVectorizer sklearn.feature_extraction from 

DEF dictvec (): 
    dict = DictVectorizer () # if sparse = False, to obtain a matrix 

    # call fit_transform 
    Data dict.fit_transform = ([{ "City": "Beijing", "temperature": 100} , 
                               { "City": "Shanghai", "temperature": 60}, 
                               { "City": "Shenzhen", "temperature": 30}]) 
　　 

    Print (dict.get_feature_names ())

    Print (Data) 

    return None 


IF the __name__ == '__main__': 
    dictvec () 

Results: # adjacency matrix

　　  (0, 1) 　　 1.0
　　  (0, 3)	100.0
　　  (1, 0)	1.0
　　  (1, 3)	60.0
　　  (2, 2)	1.0
　　  (2, 3)	30.0


　　 # If the sparse = False, to obtain a matrix

　　　　[ 'city = Shanghai', 'city = Beijing', 'city = Shenzhen', 'temperature']
　　　　[[0. 0. 1. 100. The]
　　　　　 [1. 0. 0. 60. The]
　　　　　 [0. 0. 1 30.]]


NOTE: characterization is to better understand the computer data. 


English: feature characteristics, extraction extraction, Vectorizer vector is, sparse sparse, fit for

04 _ the definition of project characteristics

Guess you like