remove a row from a dataframe if any row value is in another dataframe

aryan singh :

I have two dataframes

df1 with one column

{0:[1,2,3,4,5,6,7,11]}

df2 with two col

{0:[100,4,6,7],1:[1,3,4,7]}

i have to remove rows from df1 with value in any column in df2

ressult dataframe

df3 = [2,5,11]
kederrac :

you can use pandas.DataFrame.isin:

df1[~df1[0].isin(df2.values.flatten())]

output:

enter image description here


for large data frames, I have done the following simple benchmark:

enter image description here

import numpy as np

from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()

@b.add_function()
def anky_91(t):
    df1, df2 =  t
    df1[~df1[0].isin(df2.stack())]

@b.add_function()
def kederrac(t):  
    df1, df2 =  t
    df1[~df1[0].isin(df2.values.flatten())]

@b.add_function()   
def yatu(t):
    df1, df2 =  t
    df1[~df1.squeeze().isin(df2.stack())]


@b.add_arguments('NUmber of rows ind df')
def argument_provider():
    for exp in range(2, 18):
        size = 2**exp
        df1 = pd.DataFrame(np.random.randint(0, size // 10 or 10,size= (size , 1)))
        df2 = pd.DataFrame(np.random.randint(0, size // 10 or 10 ,size=(size , 2)))
        yield size, (df1, df2)

r = b.run()
r.plot()

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=401663&siteId=1