Lookup values from one DataFrame to create a dict from another

Lee Daniel :

I am very new to Python and came across a problem that I could not solve.

I have two Dataframe extracted columns only needed to consider, for example,

df1 
    Student ID                                          Subjects
0           S1                Maths, Physics, Chemistry, Biology
1           S2                       Maths, Chemistry, Computing
2           S3                       Maths, Chemistry, Computing
3           S4                         Biology, Chemistry, Maths
4           S5               English Literature, History, French
5           S6                       Economics, Maths, Geography
6           S7               Further Mathematics, Maths, Physics
7           S8                    Arts, Film Studies, Psychology
8           S9   English Literature, English Language, Classical
9          S10                        Business, Computing, Maths

df2
   Subject ID             Subjects
58      Che13            Chemistry
59      Bio13              Biology
60      Mat13                Maths
61     FMat13  Further Mathematics
62      Phy13              Physics
63      Eco13            Economics
64      Geo13            Geography
65      His13              History
66  EngLang13     English Langauge
67   EngLit13   English Literature

How can I compare for every df2 subjects, if there is a student taking that subject, make a dictionary with key "Subject ID" and values "student ID"?

Desired output will be something like;

Che13:[S1, S2, S3, ...]
Bio13:[S1,S4,...]
cs95 :

Use explode and map, then you can do a little grouping to get your output:

(df.set_index('Student ID')['Subjects']
   .str.split(', ')
   .explode()
   .map(df2.set_index('Subjects')['Subject ID'])
   .reset_index()
   .groupby('Subjects')['Student ID']
   .agg(list))

Subjects
Bio13                            [S1, S4]
Che13                    [S1, S2, S3, S4]
Eco13                                [S6]
EngLit13                         [S5, S9]
FMat13                               [S7]
Geo13                                [S6]
His13                                [S5]
Mat13       [S1, S2, S3, S4, S6, S7, S10]
Phy13                            [S1, S7]
Name: Student ID, dtype: object

From here, call .to_dict() if you want the result in a dictionary.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=20492&siteId=1