aryan singh :
I have two dataframes
df1 with one column
{0:[1,2,3,4,5,6,7,11]}
df2 with two col
{0:[100,4,6,7],1:[1,3,4,7]}
i have to remove rows from df1 with value in any column in df2
ressult dataframe
df3 = [2,5,11]
kederrac :
you can use pandas.DataFrame.isin:
df1[~df1[0].isin(df2.values.flatten())]
output:
for large data frames, I have done the following simple benchmark:
import numpy as np
from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()
@b.add_function()
def anky_91(t):
df1, df2 = t
df1[~df1[0].isin(df2.stack())]
@b.add_function()
def kederrac(t):
df1, df2 = t
df1[~df1[0].isin(df2.values.flatten())]
@b.add_function()
def yatu(t):
df1, df2 = t
df1[~df1.squeeze().isin(df2.stack())]
@b.add_arguments('NUmber of rows ind df')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
df1 = pd.DataFrame(np.random.randint(0, size // 10 or 10,size= (size , 1)))
df2 = pd.DataFrame(np.random.randint(0, size // 10 or 10 ,size=(size , 2)))
yield size, (df1, df2)
r = b.run()
r.plot()
Guess you like
Origin http://10.200.1.11:23101/article/api/json?id=401663&siteId=1