Is there a SQL that I can use to delete duplicate entries from a data store, while leaving a distinct copy - leave a single copy, remove all duplicate except one?
From your question, it is unclear whether your table has a unique key or not. Since you refer to this as a "data store", I'm guessing that your duplicates might be true duplicates, meaning that every value in every column is identical. Let me first address the case where the table does not have a unique key.
NO UNIQUE KEY
In this case, we have a difficult problem if we are trying to solve this with a single SQL Statement. In this situation, I recommend one of the following approaches:
1.) Add a unique key to the table
This is easy. Add a column called
IDas an integer, and make it an identifier column by checking the identity box in the table design window. Set the Identity Seed to 1 and the Identity Increment to 1. The column will automatically be populated with unique values for each row in the table. Proceed to
UNIQUE KEYsection below.
2.) Write a stored procedure. The strategy here would be to write a query that returns a row for each set of duplicates, using a query such as the following:
Use a cursor to loop through the returned rows, then for each set of duplicates, read all rows for that set:
Then delete each row except the first one returned, for each set of duplicates.
If dealing with a table that does have a unique key, the problem of removing duplicates is much easier, and able to be accomplished in one SQL statement such as the following:
Since this may appear complicated, let me explain. My strategy here is to return two lists: The first, List 1, is a list of all rows that have duplicates, and the second, List 2, is a list of one row from each set of duplicates. This query simply deletes all rows that are in List 1 but not in List 2.
|Read Other SQL Guru Questions|