How to retrieve the most repeated value in a column present in a data frame

Another way with the data.table package, which is faster for large data sets:

set.seed(1)
x=sample(seq(1,100), 5000000, replace = TRUE)

method 1 (solution proposed above)

start.time <- Sys.time()
tt <- table(x)
names(tt[tt==max(tt)])
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Time difference of 4.883488 secs

method 2 (DATA TABLE)

start.time <- Sys.time()
ds <- data.table( x )
setkey(ds, x)
sorted <- ds[,.N,by=list(x)]

most_repeated_value <- sorted[order(-N)]$x[1]
most_repeated_value

end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Time difference of 0.328033 secs

tail(names(sort(table(Forbes2000$category))), 1)

I know my answer is coming a little late, but I built the following function that does the job in less than a second for my dataframe that contains more than 50,000 rows:

print_count_of_unique_values <- function(df, column_name, remove_items_with_freq_equal_or_lower_than = 0, return_df = F, 
                                         sort_desc = T, return_most_frequent_value = F)
{
  temp <- df[column_name]
  output <- as.data.frame(table(temp))
  names(output) <- c("Item","Frequency")
  output_df <- output[  output[[2]] > remove_items_with_freq_equal_or_lower_than,  ]

  if (sort_desc){
    output_df <- output_df[order(output_df[[2]], decreasing = T), ]
  }

  cat("\nThis is the (head) count of the unique values in dataframe column '", column_name,"':\n")
  print(head(output_df))

  if (return_df){
    return(output_df)
  }

  if (return_most_frequent_value){
      output_df$Item <- as.character(output_df$Item)
      output_df$Frequency <- as.numeric(output_df$Frequency)
      most_freq_item <- output_df[1, "Item"]
      cat("\nReturning most frequent item: ", most_freq_item)
      return(most_freq_item)
  }
}

so if you have a dataframe called "df" and a column called "name" and you want to know the most comment value in the "name" column, you could run:

most_common_name <- print_count_of_unique_values(df=df, column_name = "name", return_most_frequent_value = T)

In case two or more categories may be tied for most frequent, use something like this:

x <- c("Insurance", "Insurance", "Capital Goods", "Food markets", "Food markets")
tt <- table(x)
names(tt[tt==max(tt)])
[1] "Food markets" "Insurance"

How to retrieve the most repeated value in a column present in a data frame

Tags:

R

Max

Dataframe

Related

Recent Posts