Apply a function with two fields of dataframe

babeyh :

I have a dataframe like this (To create a sample dataframe);

df = pd.DataFrame({'language': ['ruby','ruby','ruby', np.nan,'ruby'],
                   'top_lang_owned': [['ruby', 'javascript', 'go'], 
                                      ['ruby', 'coffeescript'],
                                      ['javascript', 'coffeescript'],
                                      ['ruby', 'shell', 'go'],
                                      np.nan],
                   'top_lang_watched': [['ruby','go'], 
                                      ['javascript'],
                                      np.NaN,
                                      ['ruby', 'shell'],
                                      np.nan]})
df
  language    top_lang_owned          top_lang_watched
0 ruby     [ruby, javascript, go]     [ruby, go]
1 ruby     [ruby, coffeescript]       [javascript]
2 ruby     [javascript, coffeescript]  NaN
3 NaN      [ruby, shell, go]          [ruby, shell]
4 ruby      NaN                        NaN
dataframe.info();
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
language          4 non-null object
top_lang_owned    4 non-null object
dtypes: object(2)
memory usage: 208.0+ bytes

I want to add a field that comparing two fields' values. (pseudo-code)

if ("language" is in "top_lang_owned") 
then new_field = 1 othervise new_field = 0.

For example, the desired output is must be below;

language  top_lang_owned           top_lang_watched  is_owned is_watched
0 ruby    [ruby, javascript, go]      [ruby, go]       1       1
1 ruby    [ruby, coffeescript]        [javascript]     1       0
2 ruby    [javascript, coffeescript]   NaN             0       0
3 NaN     [ruby, shell, go]           [ruby, shell]    NaN     NaN
4 ruby     NaN                         NaN             NaN     NaN
Shubham Sharma :

You can certainly do that, Here's the code you might want to try,

EDIT:

def func(x):
    if x.language in x.top_lang_owned:
        return 1
    return 0

df['is_in_lang'] = df[~df.isna().any(1)].apply(func, axis=1)

OUTPUT:

    id language                    top_lang_owned  is_in_lang
0   21     ruby            [ruby, javascript, go]           1
1   25     ruby  [javascript, ruby, coffeescript]           1
2   38     ruby        [javascript, coffeescript]           0
3  108      NaN                 [ruby, shell, go]           NaN
4  173     ruby                               NaN           NaN

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=5112&siteId=1