babeyh :
I have a dataframe like this (To create a sample dataframe);
df = pd.DataFrame({'language': ['ruby','ruby','ruby', np.nan,'ruby'],
'top_lang_owned': [['ruby', 'javascript', 'go'],
['ruby', 'coffeescript'],
['javascript', 'coffeescript'],
['ruby', 'shell', 'go'],
np.nan],
'top_lang_watched': [['ruby','go'],
['javascript'],
np.NaN,
['ruby', 'shell'],
np.nan]})
df
language top_lang_owned top_lang_watched 0 ruby [ruby, javascript, go] [ruby, go] 1 ruby [ruby, coffeescript] [javascript] 2 ruby [javascript, coffeescript] NaN 3 NaN [ruby, shell, go] [ruby, shell] 4 ruby NaN NaN
dataframe.info();
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 2 columns): language 4 non-null object top_lang_owned 4 non-null object dtypes: object(2) memory usage: 208.0+ bytes
I want to add a field that comparing two fields' values. (pseudo-code)
if ("language" is in "top_lang_owned")
then new_field = 1 othervise new_field = 0.
For example, the desired output is must be below;
language top_lang_owned top_lang_watched is_owned is_watched 0 ruby [ruby, javascript, go] [ruby, go] 1 1 1 ruby [ruby, coffeescript] [javascript] 1 0 2 ruby [javascript, coffeescript] NaN 0 0 3 NaN [ruby, shell, go] [ruby, shell] NaN NaN 4 ruby NaN NaN NaN NaN
Shubham Sharma :
You can certainly do that, Here's the code you might want to try,
EDIT:
def func(x):
if x.language in x.top_lang_owned:
return 1
return 0
df['is_in_lang'] = df[~df.isna().any(1)].apply(func, axis=1)
OUTPUT:
id language top_lang_owned is_in_lang
0 21 ruby [ruby, javascript, go] 1
1 25 ruby [javascript, ruby, coffeescript] 1
2 38 ruby [javascript, coffeescript] 0
3 108 NaN [ruby, shell, go] NaN
4 173 ruby NaN NaN