identify and mark duplicate rows in r
One dplyr
option could be:
df.now %>%
group_by(pair = paste(pmax(fit, sit), pmin(fit, sit), sep = "_")) %>%
summarise_at(vars(starts_with("value")), ~ ifelse(all(is.na(.)),
NA,
first(na.omit(.))))
pair value1 value2 value3
<chr> <dbl> <dbl> <dbl>
1 it2_it1 1 3 2
2 it4_it3 2 3 4
3 it6_it5 5 NA 2
4 it9_it7 NA 4 NA
And if you also need the pairs in individual columns, then with the addition of tidyr
you can do:
df.now %>%
group_by(pair = paste(pmax(fit, sit), pmin(fit, sit), sep = "_")) %>%
summarise_at(vars(starts_with("value")), ~ ifelse(all(is.na(.)),
NA,
first(na.omit(.)))) %>%
separate(pair, into = c("fit", "hit"), sep = "_", remove = FALSE)
pair fit hit value1 value2 value3
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 it2_it1 it2 it1 1 3 2
2 it4_it3 it4 it3 2 3 4
3 it6_it5 it6 it5 5 NA 2
4 it9_it7 it9 it7 NA 4 NA
Use !duplicated()
after sort
ing.
df.now[!duplicated(t(apply(df.now[, c("fit", "sit")], 1, sort))), ]
# value1 value2 value3 fit sit
# [1,] "1" NA NA "it1" "it2"
# [2,] "2" "3" "4" "it3" "it4"
# [3,] "5" NA NA "it5" "it6"
# [4,] NA "4" NA "it7" "it9"