R compare with two dataframes by multiple conditions to extract rows

ahai :

I have two dataframes:

 x <- data.frame('a'=c(1,1,2,2,3,3),'b'=c(1,2,1,2,1,2),'c'=c(1,2,3,2,1,3))

    y <- data.frame('a'=c(1,2,3),'b'=c(1,1,1),'c'=c(1,2,3))

would like to extract rows from x as follow conditions:

 x$a==y$a & x$b==y$b+1 & (if y$c == 1, then x$c %in% c(2,3); if ...)

result should like: x: a b c 1 2 2 only row 2 in x match. I have tried:

 z <- y[which(y$a == x$a & y$b == x$b+1),] 

to match at lease the first two conditions, but not work, the result is null. What is the suitable way to deal with it? Please help.

Uwe :

Perhaps, you can try to run an SQL statement with help of sqldf:

sqldf("select x.* from x join y where 
      x.a = y.a and 
      x.b = y.b + 1 and
      y.c = 1 and
      x.c in (2, 3)")
  a b c
1 1 2 2

This returns the expected result for the given sample dataset. However, it needs to be tested with a production dataset.

The third and fourth condition are "fixed", e.g., y$c == 1 and x$c %in% c(2, 3). So, x and y can be filtered before joining:

sqldf("select u.* from 
      (select * from x where c in (2, 3)) as u
      join 
      (select * from y where c = 1) as v
      where u.a = v.a")
  a b c
1 1 2 2

Perhaps, this might result in a performance gain (untested!)

This approach can also be used in dplyr:

library(dplyr)
x %>% 
  filter(c %in% 2:3) %>% 
  inner_join(
    y %>% 
      filter(c == 1) %>% 
      mutate(bp1 = b + 1), 
    by = c("a", "b" = "bp1"))
  a b c.x b.y c.y
1 1 2   2   1   1

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=29045&siteId=1