Delete entries with only one observation in a group

With your sample data

DG <- read.csv(text="day,City,age
4-10,Miami,30
4-10,Miami,23
4-11,New York,24
4-12,San Francisco,30")

you could use dplyr

library(dplyr)
DG %>% group_by(day,City) %>% filter(n()>1)

or base R

DG[ave(rep(1, nrow(DG)), DG$day, DG$City, FUN=length)>1,]

both return

   day  City age
1 4-10 Miami  30
2 4-10 Miami  23

Or you could use data.table (as suggested by @Frank)

library(data.table)
setDT(DG)[,if (.N>1) .SD, by=.(City,day)]

MrFlick's answer is (as usual) hard to top, but here's my longer version, which instead serves as a great way to get in some practice with dplyr.

Here's the dataframe:

DG <- data.frame(day=c('4-10', 4-10', '4-11', '4-12'), City=c('Miami', 'Miami', 'New York', 'San Francisco'), age=c(30, 23, 23, 30))

Using group_by, we group the cities together, then pipe the groupings into summarize using n(), which is a handy dplyr function.

DG1 <- DG %>%
  group_by(City, day) %>%
  summarize(n=n())
#          City  day n
#         Miami 4-10 2
#      New York 4-11 1
# San Francisco 4-12 1

Turn DG1 into a regular dataframe, just to be on the safe side:

DG2 <- data.frame(DG1)

...and then we get rid of unwanted rows via filter, based on what appeared more than once.

DG3 <- filter(DG2, n>1)
#City  day  n
#Miami 4-10 2

Next, use select to get columns (whereas we just used filter to get rows). This just gets rid of the column n.

DG4 <- select(DG3, City, day)
#City  day
#Miami 4-10

Finally, we use filter on the original dataframe to get all the cities that have multiple occurances. These cities with multiple occurances now live in DG4 (hence City==DG4$City):

DG5 <- filter(DG, City==DG4$City)
#day  City   age
#4-10 Miami  30
#4-10 Miami  23

Again, I'd go with MrFlick's answer, but if you feel like a more circuitous route with a few more dplyr functions, you might want to give this a quick look.

Delete entries with only one observation in a group

Tags:

R

Related

Recent Posts