Lee Daniel :
I am very new to Python and came across a problem that I could not solve.
I have two Dataframe extracted columns only needed to consider, for example,
df1
Student ID Subjects
0 S1 Maths, Physics, Chemistry, Biology
1 S2 Maths, Chemistry, Computing
2 S3 Maths, Chemistry, Computing
3 S4 Biology, Chemistry, Maths
4 S5 English Literature, History, French
5 S6 Economics, Maths, Geography
6 S7 Further Mathematics, Maths, Physics
7 S8 Arts, Film Studies, Psychology
8 S9 English Literature, English Language, Classical
9 S10 Business, Computing, Maths
df2
Subject ID Subjects
58 Che13 Chemistry
59 Bio13 Biology
60 Mat13 Maths
61 FMat13 Further Mathematics
62 Phy13 Physics
63 Eco13 Economics
64 Geo13 Geography
65 His13 History
66 EngLang13 English Langauge
67 EngLit13 English Literature
How can I compare for every df2 subjects, if there is a student taking that subject, make a dictionary with key "Subject ID" and values "student ID"?
Desired output will be something like;
Che13:[S1, S2, S3, ...]
Bio13:[S1,S4,...]
cs95 :
Use explode
and map
, then you can do a little grouping to get your output:
(df.set_index('Student ID')['Subjects']
.str.split(', ')
.explode()
.map(df2.set_index('Subjects')['Subject ID'])
.reset_index()
.groupby('Subjects')['Student ID']
.agg(list))
Subjects
Bio13 [S1, S4]
Che13 [S1, S2, S3, S4]
Eco13 [S6]
EngLit13 [S5, S9]
FMat13 [S7]
Geo13 [S6]
His13 [S5]
Mat13 [S1, S2, S3, S4, S6, S7, S10]
Phy13 [S1, S7]
Name: Student ID, dtype: object
From here, call .to_dict()
if you want the result in a dictionary.