版权声明:诸葛老刘所有 https://blog.csdn.net/weixin_39791387/article/details/82851773
pandas.DataFrame
去重
去重的方式:
样本数据
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'colA' : list('AABCA'), 'colB' : list('AABDA'),'col
...: C' : [100,100,30,50,20], 'colD': [100,100,60,80,50]})
In [3]: df
Out[3]:
colA colB colC colD
0 A A 100 100
1 A A 100 100
2 B B 30 60
3 C D 50 80
4 A A 20 50
1. 按全量字段去重
In [4]: df2 = df.drop_duplicates()
In [5]: df2
Out[5]:
colA colB colC colD
0 A A 100 100
2 B B 30 60
3 C D 50 80
4 A A 20 50
2. 按指定字段去重
In [6]: df3 = df.drop_duplicates(['colA', 'colB']);df3
Out[6]:
colA colB colC colD
0 A A 100 100
2 B B 30 60
3 C D 50 80
如有解释不尽不处,请参阅官方文档