Lazloo Xp :
I am looking for an efficient way to merge two pandas data frames based on a function that takes as input columns from both data frames and returns True or False. E.g. Assume I have the following "tables":
import pandas as pd
df_1 = pd.DataFrame(data=[1, 2, 3])
df_2 = pd.DataFrame(data=[4, 5, 6])
def validation(a, b):
return ((a + b) % 2) == 0
I would like to join df1 and df2 on each row where the sum of the first column is an even number. The resulting table would be
1 5
df_3 = 2 4
2 6
3 5
Please think of it as a general problem not as a task to return just df_3. The solution should accept any function that validates a combination of columns and return True or False.
THX Lazloo
Ayoub ZAROU :
This is a basic solution but not very efficient if you are working on large dataframes
df_1.index *= 0
df_2.index *= 0
df = df_1.join(df_2, lsuffix='_2')
df = df[df.sum(axis=1) % 2 == 0]
Edit, here is a better solution
df_1.index = df_1.iloc[:,0] % 2
df_2.index = df_2.iloc[:,0] % 2
df = df_1.join(df_2, lsuffix='_2')