Deleting all duplicates
Try this:
DELETE FROM emailTable WHERE NOT EXISTS (
SELECT * FROM (
SELECT MIN(id) minID FROM emailTable
GROUP BY email HAVING COUNT(*) > 0
) AS q
WHERE minID=id
)
The above worked for my test of 50 emails (5 different emails duplicated 10 times).
You might need to add an index on the 'email' column:
ALTER TABLE emailTable ADD INDEX ind_email (email);
It might be a bit slow fro 250,000 rows. It was slow for me on a table that had 1.5million rows (properly indexed), which is how I came up with this strategy:
/* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */
CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory;
/* INSERT THE MINIMUM IDs */
INSERT INTO email_min SELECT id FROM email
GROUP BY email HAVING MIN(id);
/* MAKE SURE YOU HAVE RIGHT INFO */
SELECT * FROM email
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
/* DELETE FROM EMAIL */
DELETE FROM email
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
/* IF ALL IS WELL, DROP MEMORY TABLE */
DROP TABLE email_min;
The benefit of the memory table is there's an index that is used (primary key on minID) that speeds up the process over a normal temporary table.
Here is a more streamlined deletion process:
CREATE TABLE emailUnique LIKE emailTable;
ALTER TABLE emailUnique ADD UNIQUE INDEX (email);
INSERT IGNORE INTO emailUnique SELECT * FROM emailTable;
SELECT * FROM emailUnique;
ALTER TABLE emailTable RENAME emailTable_old;
ALTER TABLE emailUnique RENAME emailTable;
DROP TABLE emailTable_old;
Here is some sample data:
use test
DROP TABLE IF EXISTS emailTable;
CREATE TABLE `emailTable` (
`id` mediumint(9) NOT NULL auto_increment,
`email` varchar(200) NOT NULL default '',
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
INSERT INTO emailTable (email) VALUES
('redwards@gmail.com'),
('redwards@gmail.com'),
('redwards@gmail.com'),
('redwards@gmail.com'),
('rolandoedwards@gmail.com'),
('rolandoedwards@gmail.com'),
('rolandoedwards@gmail.com'),
('red@gmail.com'),
('red@gmail.com'),
('red@gmail.com'),
('rolandoedwards@gmail.com'),
('rolandoedwards@gmail.com'),
('rolandoedwards@comcast.net'),
('rolandoedwards@comcast.net'),
('rolandoedwards@comcast.net');
SELECT * FROM emailTable;
I ran them. Here are the results:
mysql> use test
Database changed
mysql> DROP TABLE IF EXISTS emailTable;
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE `emailTable` (
-> `id` mediumint(9) NOT NULL auto_increment,
-> `email` varchar(200) NOT NULL default '',
-> PRIMARY KEY (`id`)
-> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.05 sec)
mysql> INSERT INTO emailTable (email) VALUES
-> ('redwards@gmail.com'),
-> ('redwards@gmail.com'),
-> ('redwards@gmail.com'),
-> ('redwards@gmail.com'),
-> ('rolandoedwards@gmail.com'),
('rolandoedwards@comcast.net');
SELECT * FROM emailTable;
-> ('rolandoedwards@gmail.com'),
-> ('rolandoedwards@gmail.com'),
-> ('red@gmail.com'),
-> ('red@gmail.com'),
-> ('red@gmail.com'),
-> ('rolandoedwards@gmail.com'),
-> ('rolandoedwards@gmail.com'),
-> ('rolandoedwards@comcast.net'),
-> ('rolandoedwards@comcast.net'),
-> ('rolandoedwards@comcast.net');
Query OK, 15 rows affected (0.00 sec)
Records: 15 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM emailTable;
+----+----------------------------+
| id | email |
+----+----------------------------+
| 1 | redwards@gmail.com |
| 2 | redwards@gmail.com |
| 3 | redwards@gmail.com |
| 4 | redwards@gmail.com |
| 5 | rolandoedwards@gmail.com |
| 6 | rolandoedwards@gmail.com |
| 7 | rolandoedwards@gmail.com |
| 8 | red@gmail.com |
| 9 | red@gmail.com |
| 10 | red@gmail.com |
| 11 | rolandoedwards@gmail.com |
| 12 | rolandoedwards@gmail.com |
| 13 | rolandoedwards@comcast.net |
| 14 | rolandoedwards@comcast.net |
| 15 | rolandoedwards@comcast.net |
+----+----------------------------+
15 rows in set (0.00 sec)
mysql> CREATE TABLE emailUnique LIKE emailTable;
Query OK, 0 rows affected (0.04 sec)
mysql> ALTER TABLE emailUnique ADD UNIQUE INDEX (email);
Query OK, 0 rows affected (0.06 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> INSERT IGNORE INTO emailUnique SELECT * FROM emailTable;
Query OK, 4 rows affected (0.01 sec)
Records: 15 Duplicates: 11 Warnings: 0
mysql> SELECT * FROM emailUnique;
+----+----------------------------+
| id | email |
+----+----------------------------+
| 1 | redwards@gmail.com |
| 5 | rolandoedwards@gmail.com |
| 8 | red@gmail.com |
| 13 | rolandoedwards@comcast.net |
+----+----------------------------+
4 rows in set (0.00 sec)
mysql> ALTER TABLE emailTable RENAME emailTable_old;
Query OK, 0 rows affected (0.03 sec)
mysql> ALTER TABLE emailUnique RENAME emailTable;
Query OK, 0 rows affected (0.00 sec)
mysql> DROP TABLE emailTable_old;
Query OK, 0 rows affected (0.00 sec)
mysql>
As shown the emailTable will contain the first occurrence of each email address and the corresponding original id. For this example:
- IDs 1-4 have redwards@gmail.com, but only 1 was preserved.
- IDs 5-7,11,12 have rolandoedwards@gmail.com, but only 5 was preserved.
- IDs 8-10 have red@gmail.com, but only 8 was preserved.
- IDs 13-15 have rolandoedwards@comcast.net, but only 13 was preserved.
CAVEAT : I answered a question similar to this concerning table deletion by means of a temp table approach.
Give it a Try !!!