unique() for more than one variable
This is an addition to Josh's answer.
You can also keep the values of other variables while filtering out duplicated rows in data.table
Example:
library(data.table)
#create data table
dt <- data.table(
V1=LETTERS[c(1,1,1,1,2,3,3,5,7,1)],
V2=LETTERS[c(2,3,4,2,1,4,4,6,7,2)],
V3=c(1),
V4=c(2) )
> dt
# V1 V2 V3 V4
# A B 1 2
# A C 1 2
# A D 1 2
# A B 1 2
# B A 1 2
# C D 1 2
# C D 1 2
# E F 1 2
# G G 1 2
# A B 1 2
# set the key to all columns
setkey(dt)
# Get Unique lines in the data table
unique( dt[list(V1, V2), nomatch = 0] )
# V1 V2 V3 V4
# A B 1 2
# A C 1 2
# A D 1 2
# B A 1 2
# C D 1 2
# E F 1 2
# G G 1 2
Alert: If there are different combinations of values in the other variables, then your result will be
unique combination of V1 and V2
How about using unique()
itself?
df <- data.frame(yad = c("BARBIE", "BARBIE", "BAKUGAN", "BAKUGAN"),
per = c("AYLIK", "AYLIK", "2 AYLIK", "2 AYLIK"),
hmm = 1:4)
df
# yad per hmm
# 1 BARBIE AYLIK 1
# 2 BARBIE AYLIK 2
# 3 BAKUGAN 2 AYLIK 3
# 4 BAKUGAN 2 AYLIK 4
unique(df[c("yad", "per")])
# yad per
# 1 BARBIE AYLIK
# 3 BAKUGAN 2 AYLIK
unique based on any columns and keep all other columns using dplyr
.
df <- df %>%
distinct(col1, col2, .keep_all = TRUE)