Remove *all* duplicate rows, unless there's a "similar" row
An option would be to group by 'V1', get the index of group that has length of unique elements greater than 1 and then take the unique
unique(dt[dt[, .(i1 = .I[uniqueN(V2) > 1]), V1]$i1])
# V1 V2
#1: 2 5
#2: 2 6
#3: 2 7
Or as @r2evans mentioned
unique(dt[, .SD[(uniqueN(V2) > 1)], by = "V1"])
NOTE: The OP's dataset is data.table
and data.table
methods are the natural way of doing it
If we need a tidyverse
option, a comparable one to the above data.table
option is
library(dplyr)
dt %>%
group_by(V1) %>%
filter(n_distinct(V2) > 1) %>%
distinct()
Also one dplyr
possibility:
dt %>%
group_by(V1) %>%
filter(n_distinct(V2) != 1 & !duplicated(V2))
V1 V2
<dbl> <dbl>
1 2 5
2 2 6
3 2 7
Or:
dt %>%
group_by(V1) %>%
filter(n_distinct(V2) != 1) %>%
group_by(V1, V2) %>%
slice(1)
In your case with base R
dt[ave(dt$V2,dt$V1,FUN=function(x) length(unique(x)))>1&!duplicated(dt)]
V1 V2
1: 2 5
2: 2 6
3: 2 7