Remove duplicate records based on multiple columns?
class Model
def self.dedupe
# find all models and group them on keys which should be common
grouped = all.group_by{|model| [model.name,model.year,model.trim,model.make_id] }
grouped.values.each do |duplicates|
# the first one we want to keep right?
first_one = duplicates.shift # or pop for last one
# if there are any more left, they are duplicates
# so delete all of them
duplicates.each{|double| double.destroy} # duplicates can now be destroyed
end
end
end
Model.dedupe
- Find All
- Group them on keys which you need for uniqueness
- Loop on the grouped model's values of the hash
- remove the first value because you want to retain one copy
- delete the rest
If your User table data like below
User.all =>
[
#<User id: 15, name: "a", email: "[email protected]", created_at: "2013-08-06 08:57:09", updated_at: "2013-08-06 08:57:09">,
#<User id: 16, name: "a1", email: "[email protected]", created_at: "2013-08-06 08:57:20", updated_at: "2013-08-06 08:57:20">,
#<User id: 17, name: "b", email: "[email protected]", created_at: "2013-08-06 08:57:28", updated_at: "2013-08-06 08:57:28">,
#<User id: 18, name: "b1", email: "[email protected]", created_at: "2013-08-06 08:57:35", updated_at: "2013-08-06 08:57:35">,
#<User id: 19, name: "b11", email: "[email protected]", created_at: "2013-08-06 09:01:30", updated_at: "2013-08-06 09:01:30">,
#<User id: 20, name: "b11", email: "[email protected]", created_at: "2013-08-06 09:07:58", updated_at: "2013-08-06 09:07:58">]
1.9.2p290 :099 >
Email id's are duplicate, so our aim is to remove all duplicate email ids from user table.
Step 1:
To get all distinct email records id.
ids = User.select("MIN(id) as id").group(:email,:name).collect(&:id)
=> [15, 16, 18, 19, 17]
Step 2:
To remove duplicate id's from user table with distinct email records id.
Now the ids array holds the following ids.
[15, 16, 18, 19, 17]
User.where("id NOT IN (?)",ids) # To get all duplicate records
User.where("id NOT IN (?)",ids).destroy_all
** RAILS 4 **
ActiveRecord 4 introduces the .not
method which allows you to write the following in Step 2:
User.where.not(id: ids).destroy_all