ibarant :
I have a dataframe that looks like this:
id tag1 tag2 tag3 col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
id1 col3 col4 col7 0 0 0 0 0 0 0 0 0 0
id2 col1 col2 col9 0 0 0 0 0 0 0 0 0 0
id3 col2 col5 col6 0 0 0 0 0 0 0 0 0 0
id4 col3 col6 col10 0 0 0 0 0 0 0 0 0 0
id5 col1 col7 col8 0 0 0 0 0 0 0 0 0 0
id6 col4 col6 col9 0 0 0 0 0 0 0 0 0 0
id7 col5 col7 col10 0 0 0 0 0 0 0 0 0 0
id8 col2 col3 col6 0 0 0 0 0 0 0 0 0 0
id9 col5 col9 col10 0 0 0 0 0 0 0 0 0 0
id10 col4 col8 col9 0 0 0 0 0 0 0 0 0 0
What I need to get is a "Base" dataframe, showing 1 where columns (col1 to col10) values appeared in the tags:
id tag1 tag2 tag3 col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
id1 col3 col4 col7 0 0 1 1 0 0 1 0 0 0
id2 col1 col2 col9 1 1 0 0 0 0 0 0 1 0
id3 col2 col5 col6 0 1 0 0 1 1 0 0 0 0
id4 col3 col6 col10 0 0 1 0 0 1 0 0 0 1
id5 col1 col7 col8 1 0 0 0 0 0 1 1 0 0
id6 col4 col6 col9 0 0 0 1 0 1 0 0 1 0
id7 col5 col7 col10 0 0 0 0 1 0 1 0 0 1
id8 col2 col3 col6 0 1 1 0 0 1 0 0 0 0
id9 col5 col9 col10 0 0 0 0 1 0 0 0 1 1
id10 col4 col8 col9 0 0 0 1 0 0 0 1 1 0
I really don't want to use the triple loop like:
cols = [el for el in df if el.startswith('col')]
tags = [el for el in df if el.startswith('tag')]
for index, row in df.iterrows():
for col in cols:
for tag in tags:
if row[tag] == col:
row[col] +=1
but even that doesn't work yet. What would be the best approach and what is wrong above?
Thank you very much!
Quang Hoang :
A combination of get_dummies
and updat
would give you want you want:
df.update(pd.get_dummies(df.filter(like='tag')
.stack()
).sum(level=0)
)
Output:
id tag1 tag2 tag3 col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
-- ---- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -------
0 id1 col3 col4 col7 0 0 1 1 0 0 1 0 0 0
1 id2 col1 col2 col9 1 1 0 0 0 0 0 0 1 0
2 id3 col2 col5 col6 0 1 0 0 1 1 0 0 0 0
3 id4 col3 col6 col10 0 0 1 0 0 1 0 0 0 1
4 id5 col1 col7 col8 1 0 0 0 0 0 1 1 0 0
5 id6 col4 col6 col9 0 0 0 1 0 1 0 0 1 0
6 id7 col5 col7 col10 0 0 0 0 1 0 1 0 0 1
7 id8 col2 col3 col6 0 1 1 0 0 1 0 0 0 0
8 id9 col5 col9 col10 0 0 0 0 1 0 0 0 1 1
9 id10 col4 col8 col9 0 0 0 1 0 0 0 1 1 0