1、构建测试数据
import pandas as pd
df = pd.DataFrame({'k1' : ['a1','a2','a1','b1','b2'],
'k2' : ['c1','d1','c1','c2','d2'],
'data' : [10,100,20,30,300]})
print(df)
k1 k2 data
0 a1 c1 10
1 a2 d1 100
2 a1 c1 20
3 b1 c2 30
4 b2 d2 300
2、使用drop_duplicates()函数找出重复的行
###找出k1列的重复数据
df_tmp1 = df.drop_duplicates(subset=['k1'])
df_tmp2 = df.drop_duplicates(subset=['k1'], keep=False)
df_tmp3 = pd.concat([df_tmp1, df_tmp2], axis = 0)
df_tmp4 = df_tmp3.drop_duplicates(subset=['k1'], keep=False)
print(df_tmp4)
k1 k2 data
0 a1 c1 10
至此。通过drop_duplicates函数找出了k1列含有重复数据的值。如果不是想找某一列含有重复的数据,而是整行都重复的话。在第2步的代码中无需subset=['k1']即可