[Python tips] When using the map function, be sure to use .str.strip() to remove the left and right spaces first.

When doing big data analysis, we often use dataframe. Data conversion generally needs to be performed before specific training. Map is one of the commonly used methods. I encountered a problem recently and was puzzled about the solution, and finally found out what the problem was. This article is for the convenience of friends who also encounter the same problem.


Problem Description

Use the map function to replace the corresponding values. The original data does not have nan values. After replacement, NaN values ​​appear, which affects subsequent operations.
The goal is to replace 'buy' in column y in the original data with 1, 'sell' with -1, and nan value with 0. The map statement is as follows:

df['y'] = df['y'].map({
    
    'buy':1,'sell':-1,np.nan:0}, na_action=None)

Data before replacement:

1228      0
1229      0
1230      0
1231      0
1232      0
1233      0
1234      0
1235      0
1236      0
1237      0
1238      0
1239      0
1240      0
1241      0
1242      0
1243    buy
1244      0
1245      0
1246      0
1247      0
Name: y, dtype: object

Data after replacement

1228    NaN
1229    NaN
1230    NaN
1231    NaN
1232    NaN
1233    NaN
1234    NaN
1235    NaN
1236    NaN
1237    NaN
1238    NaN
1239    NaN
1240    NaN
1241    NaN
1242    NaN
1243    1.0
1244    NaN
1245    NaN
1246    NaN
1247    NaN
Name: y, dtype: float64

Cause Analysis:

Using dictionary mapping but getting NaN error on output column. After many checks, it is indeed difficult to find the problem through print. If you encounter the above problem, you can first delete the left and right spaces in the string column and then perform mapping replacement. Spaces will cause the replacement to not be found, so NaN is output.


solution:

First add .str.strip() to the y column value, remove the left and right spaces before performing mapping replacement.

df['y'] = df['y'].str.strip().map({
    
    'buy':1,'sell':-1,np.nan:0}, na_action=None)
1228      0
1229      0
1230      0
1231      0
1232      0
1233      0
1234      0
1235      0
1236      0
1237      0
1238      0
1239      0
1240      0
1241      0
1242      0
1243    buy
1244      0
1245      0
1246      0
1247      0
Name: y, dtype: object
1228    0
1229    0
1230    0
1231    0
1232    0
1233    0
1234    0
1235    0
1236    0
1237    0
1238    0
1239    0
1240    0
1241    0
1242    0
1243    1
1244    0
1245    0
1246    0
1247    0
Name: y, dtype: int64

As you can see, the replacement result this time is normal.

Guess you like

Origin blog.csdn.net/popboy29/article/details/132014975