MySQL remove duplicates (large database + fast)

In the end, this was for me the fastest method to remove dups from the big db I’m working on:

  1. Duplicate the structure of the database (ie, “db”) to a new database (ie, “db2”, setting the fields you don’t want to be dups to UNIQUE.
  2. Copy db to db2 using INSERT IGNORE:

INSERT IGNORE INTO
db2
(`xxx`,`yyy`,`zzz`)
SELECT `xxx`,`yyy`,`zzz`
FROM `db`

This reduced the amount of time to minutes, while any other method could take many hours, appart from creating large temporary files which could fill the disk holding the temp dir and finally abort the process.

If you don’t need in the end to have those UNIQUE fields (the indexes takes many HD), you should re-dump db2 to a new db without that UNIQUE fields => again some more minutes, but still worth against 12 hours of CPU/RAM intensive working (in a highly mysql responsive profile) with other kind of filtering methods, such as DISTINCT or GROUP BY.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s