When doing big data analysis, we often use dataframe. Data conversion generally needs to be performed before specific training. Map is one of the commonly used methods. I encountered a problem recently and was puzzled about the solution, and finally found out what the problem was. This article is for the convenience of friends who also encounter the same problem.
Problem Description
Use the map function to replace the corresponding values. The original data does not have nan values. After replacement, NaN values appear, which affects subsequent operations.
The goal is to replace 'buy' in column y in the original data with 1, 'sell' with -1, and nan value with 0. The map statement is as follows:
df['y'] = df['y'].map({
'buy':1,'sell':-1,np.nan:0}, na_action=None)
Data before replacement:
1228 0
1229 0
1230 0
1231 0
1232 0
1233 0
1234 0
1235 0
1236 0
1237 0
1238 0
1239 0
1240 0
1241 0
1242 0
1243 buy
1244 0
1245 0
1246 0
1247 0
Name: y, dtype: object
Data after replacement
1228 NaN
1229 NaN
1230 NaN
1231 NaN
1232 NaN
1233 NaN
1234 NaN
1235 NaN
1236 NaN
1237 NaN
1238 NaN
1239 NaN
1240 NaN
1241 NaN
1242 NaN
1243 1.0
1244 NaN
1245 NaN
1246 NaN
1247 NaN
Name: y, dtype: float64
Cause Analysis:
Using dictionary mapping but getting NaN error on output column. After many checks, it is indeed difficult to find the problem through print. If you encounter the above problem, you can first delete the left and right spaces in the string column and then perform mapping replacement. Spaces will cause the replacement to not be found, so NaN is output.
solution:
First add .str.strip() to the y column value, remove the left and right spaces before performing mapping replacement.
df['y'] = df['y'].str.strip().map({
'buy':1,'sell':-1,np.nan:0}, na_action=None)
1228 0
1229 0
1230 0
1231 0
1232 0
1233 0
1234 0
1235 0
1236 0
1237 0
1238 0
1239 0
1240 0
1241 0
1242 0
1243 buy
1244 0
1245 0
1246 0
1247 0
Name: y, dtype: object
1228 0
1229 0
1230 0
1231 0
1232 0
1233 0
1234 0
1235 0
1236 0
1237 0
1238 0
1239 0
1240 0
1241 0
1242 0
1243 1
1244 0
1245 0
1246 0
1247 0
Name: y, dtype: int64
As you can see, the replacement result this time is normal.