Fill np.nan condition within for/if-else loop

TheMattdavis17 :

I have been working on this for a while and just can't seem to find the answer to what I need. Suppose I have a dataframe as below.

What I would like to do is fill the last three rows of df['gender'] based on the value in df['home_work'] column, specifically if home_work > 9, then m, if not, then f. Please keep in mind this is just a made up dataset and I don't mean to offend anyone, I promise!

enr = pd.DataFrame({'name_id':[1254, 1359, 1254, 1296, 1353, 2656], 
                   'enrollment_term':['spring 2018', 'spring 2018', 'fall 2018', 'spring 2018', 'spring 2018', 'fall 2020'],
                   'gpa_term': [2.93, np.nan, 1.65, 4.00, 3.95, 2.92],
                   'dog_owner':[0,1,1,1, 1, 0],
                   'salary':[50657, 90658, np.nan, 104352, np.nan, 102043],
                   'home_work':[34, np.nan, 12, 9, 8, 27],
                   'gender':['m','f','f',np.nan, np.nan, np.nan]})

enr

Below is the code that I attempted but it presented the error down below:

for i in df['gender'].isna():
    if df['home_work'][i] > 9:
        df['gender'][i].fillna('m')
    else:
        df['gender'][i].fillna('f')
KeyError: False

Any help would be greatly appreciated as I have been working on this for a while. I have a dataset of 90K + that I want to adapt this work to and would like to create a function that streamlines this process but have hit a speedbump!

The issue I am running into is that np.nan defaults and fills in a value for gender if it doesn't meet the requirement. Thoughts?


# Edited

Suppose I have the following df:

enr = pd.DataFrame({'name_id':[1254, 1359, 1254, 1296, 1353, 2656], 
                   'enrollment_term':['spring 2018', 'spring 2018', 'fall 2018', 'spring 2018', 'spring 2018', 'fall 2020'],
                   'gpa_term': [2.93, np.nan, 1.65, 4.00, 3.95, 2.92],
                   'dog_owner':[0,1,1,1, 1, 0],
                   'salary':[50657, 90658, np.nan, 104352, np.nan, 102043],
                   'home_work':[np.nan, np.nan, 0.7, 0.3, 0.64, 0.49],
                   'gender':[0, 1, 1,np.nan, np.nan, np.nan]})

enter image description here

I would like to impute enr['gender'] based on home_work. If enr['home_work'] >= 0.5, then enr['gender'] == 0, else (as long as enr['home_work'] != np.nan), enr['gender'] == 1.

What I don't want is imputation of values in enr[gender] where their enr['home_work'] is np.nanI have tried many different techniques but all seem to impute a 1. Thoughts?

jezrael :

Use numpy.where with Series.fillna:

enr['gender'] = np.where(enr['home_work'] > 9,  
                         enr['gender'].fillna('m'),
                         enr['gender'].fillna('f'))

Or filter separately by 2 masks:

m = enr['gender'].isna()
enr.loc[m, 'gender'] = np.where(enr['home_work'] > 9,  'm',  'f')[m]

print (enr)
   name_id enrollment_term  gpa_term  dog_owner    salary  home_work gender
0     1254     spring 2018      2.93          0   50657.0         34      m
1     1359     spring 2018       NaN          1   90658.0         42      f
2     1254       fall 2018      1.65          1       NaN         12      f
3     1296     spring 2018      4.00          1  104352.0          9      f
4     1353     spring 2018      3.95          1       NaN          8      f
5     2656       fall 2020      2.92          0  102043.0         27      m

EDIT:

m = enr['gender'].isna() & enr['home_work'].notna()
enr.loc[m, 'gender'] = np.where(enr['home_work'] >= 0.5, 0, 1)[m]
print (enr)
   name_id enrollment_term  gpa_term  dog_owner    salary  home_work  gender
0     1254     spring 2018      2.93          0   50657.0        NaN     0.0
1     1359     spring 2018       NaN          1   90658.0        NaN     1.0
2     1254       fall 2018      1.65          1       NaN       0.70     1.0
3     1296     spring 2018      4.00          1  104352.0       0.30     1.0
4     1353     spring 2018      3.95          1       NaN       0.64     0.0
5     2656       fall 2020      2.92          0  102043.0       0.49     1.0

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=17853&siteId=1