get_dummies() in pandas

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False)

Parameter Description:

data: array-like, Series, or DataFrame input data
prefix: string, list of strings, or dict of strings, default None. After get_dummies is converted, the prefix of the column name
columns: list-like, default None specifies the need to implement category conversion The column name
dummy_na: bool, default False add a column to indicate the vacant value, if False, ignore the vacant value
drop_first: bool, default False Get k-1 category values ​​in k, remove the first

 get_dummies is a onehot encoding method that converts variables with different values ​​into 0/1 values. For example, we use 1, 2, and 3 to represent the numbers of the three colors yellow, red, and blue. It is just distinguished by 1, 2, and 3. In fact, 1, 2, and 3 have no numerical significance.

import pandas as pd
xiaoming=pd.DataFrame([1,2,3],index=['yellow','red','blue'],columns=['hat'])
print(xiaoming)
hat_ranks=pd.get_dummies(xiaoming['hat'],prefix='hat')
print(hat_ranks.head())

Output result: 

        hat
yellow    1
red       2
blue      3
        hat_1  hat_2  hat_3
yellow      1      0      0
red         0      1      0
blue        0      0      1

 To give another example, we did not set the number, only the type (color and class). Let the number that comes with the program as the distinguishing type, and the results obtained are as follows:

import pandas as pd
df = pd.DataFrame([  
            ['green' , 'A'],   
            ['red'   , 'B'],   
            ['blue'  , 'A']])  

df.columns = ['color',  'class'] 
pd.get_dummies(df) 

 

 Also note that:

The coding of discrete features is divided into two cases:

1. There is no significance between the values ​​of discrete features, such as color: [red, blue, green], then use one-hot encoding

2. The value of the discrete feature has the meaning of size, such as size:[X,XL,XXL], then use the numerical mapping {X:1,XL:2,XXL:3}

Insert picture description here 

Guess you like

Origin blog.csdn.net/weixin_40244676/article/details/105964720