【pandas】[5] DataFrame通过drop_duplicates()函数找出重复的行

1、构建测试数据

import pandas as pd
df = pd.DataFrame({'k1' : ['a1','a2','a1','b1','b2'],
    'k2' : ['c1','d1','c1','c2','d2'],
    'data' : [10,100,20,30,300]})
print(df)
   k1  k2  data
0  a1  c1    10
1  a2  d1   100
2  a1  c1    20
3  b1  c2    30
4  b2  d2   300

2、使用drop_duplicates()函数找出重复的行

###找出k1列的重复数据
df_tmp1 = df.drop_duplicates(subset=['k1'])
df_tmp2 = df.drop_duplicates(subset=['k1'], keep=False)
df_tmp3 = pd.concat([df_tmp1, df_tmp2], axis = 0)
df_tmp4 = df_tmp3.drop_duplicates(subset=['k1'], keep=False)
print(df_tmp4)
   k1  k2  data
0  a1  c1    10

至此。通过drop_duplicates函数找出了k1列含有重复数据的值。如果不是想找某一列含有重复的数据,而是整行都重复的话。在第2步的代码中无需subset=['k1']即可

猜你喜欢

转载自blog.csdn.net/xiezhen_zheng/article/details/105352913