And excel conditional maximum data frame python

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link:
https://blog.csdn.net/qq_41858657/article/details/103374630
Links to referenced documents:
https://blog.csdn.net/weixin_37855575/article/details/82288011
--------- -------
detail about the subject, the examination has rebuilt the students take the exam several times the highest score in a data box and delete the other
to a contest Excel and Python are:

Excel:

=MAXIFS($G:$G,$A:$A,A2)

As shown in Table:
Because data security, show only a desired portion, A XH as a student number, G KCCJ as the high number of results, a final high as the highest number of selected results.
Here Insert Picture Description
This will last a drop-down to complete.
Then preserved the original number of high scores can be deleted, re-use index will learn Student ID number duplicate columns delete, delete duplicates the data stored in Excel is complete.
Here Insert Picture Description

python:

Import csv

import pandas as pd
import numpy as np
df=pd.read_csv('E:\\项目\\高数分析\\高数分析\\数学公共课考试成绩\\高数成绩.csv',sep=',')

Integrated into an operation code:

df3=df.loc[df.reset_index().groupby(['XH'])['KCCJ'].idxmax()]

Re-export

##保存入csv
df3.to_csv('E:\\项目\\高数分析\\高数分析\\数学公共课考试成绩\\高数成绩2.csv',index=False,header=False)

Elaborate a code that did things:

##对df进行索引
df=df.set_index(keys=['XH'])
##对df重新设置索引
df1=df.reset_index().groupby(['XH'])
df1.groups
#指定各groups中某一列,并idxmax()提取该列最大值的索引
df2=df.reset_index().groupby(['XH'])['KCCJ'].idxmax()
##利用iloc选取行数据
df.iloc[df2]

It can also be used iloc index

##loc索引为具体值索引
df.reset_index().loc[df2]

## Finally, the integration of the code
DF3 = df3.reset_index ()
df3.head ()

Published 20 original articles · won praise 3 · Views 3532

Guess you like

Origin blog.csdn.net/qq_41858657/article/details/103374630