Deleting rows that are duplicated in one column based on the conditions of another column

Lets say you have data in df

df = df[order(df[,'Date'],-df[,'Depth']),]
df = df[!duplicated(df$Date),]

Introducing a data.table solution which will be the fastest way to solve this (assuming data is your data set)

library(data.table)
unique(setDT(data)[order(Date, -Depth)], by = "Date")

Just another way:

setDT(data)[data[, .I[which.max(Depth)], by=Date]$V1]

This might be not the fastest approach if your data frame is large, but a fairly strightforward one. This might change the order of your data frame and you might need to reorder by e.g. date afterwards. Instead of deleting we split the data by date, in each chunk pick a row with the maximum date and finally join the result back into a data frame

data = split(data, data$Date)
data = lapply(data, function(x) x[which.max(x$Depth), , drop=FALSE])
data = do.call("rbind", data)

Deleting rows that are duplicated in one column based on the conditions of another column

Tags:

Date

R

Duplicate Removal

Related

Recent Posts