When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips

Sections:
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
XML Info
Information:
Feedback
Author an Article
ASP ASP.NET ASP FAQs Message Board Feedback

The 4 Guys Present: ASPFAQs.com

Jump to a FAQ
Enter FAQ #:
..or see our 10 Most Viewed FAQs.

4GuysFromRolla.com : ASP FAQS : Databases, Queries


Question:

I have many duplicate records in my database table. I want to delete all of them except one. How can I do this?


[Print this FAQ]

Answer: The situation: you have a database table with a number of duplicate rows (that is, every column each duplicate row matches). You'd like to delete all of these rows save one, essentially removing all duplicates.

There are two scenarios we must consider in this feat. The first one is that there is no unique key for each row in the database; the second scenario we'll examine is that such a key does exist. (The second scenario is "better," since we can perform the DELETE with a single SQL statement, whereas the first scenario requires creating a unique key and then using a cursor... bleh.)

NO UNIQUE KEY
In this case, we have a difficult problem if we are trying to solve this with a single SQL Statement. In this situation, I recommend one of the following approaches:

1.) Add a unique key to the table This is easy. Add a column called ID as an integer, and make it an identifier column by checking the identity box in the table design window. Set the Identity Seed to 1 and the Identity Increment to 1. The column will automatically be populated with unique values for each row in the table. Proceed to UNIQUE KEY section below.

2.) Write a stored procedure. The strategy here would be to write a query that returns a row for each set of duplicates, using a query such as the following:

SELECT Field1, Field2, Count(ID)
FROM Foo1
GROUP BY Foo1.Field1, Foo1.Field2
HAVING Count(Foo1.ID) > 1

Use a cursor to loop through the returned rows, then for each set of duplicates, read all rows for that set:

SELECT Field1, Field2, ID
FROM Foo1
WHERE Field1 = @FIELD1 and Field2 = @FIELD2

Then delete each row except the first one returned, for each set of duplicates.

UNIQUE KEY
If dealing with a table that does have a unique key, the problem of removing duplicates is much easier, and able to be accomplished in one SQL statement such as the following:

DELETE
FROM Foo1
WHERE Foo1.ID IN

  -- List 1 - all rows that have duplicates
  (SELECT F.ID
  FROM Foo1 AS F
  WHERE Exists (SELECT Field1, Field2, Count(ID)
      FROM Foo1
      WHERE Foo1.Field1 = F.Field1
           AND Foo1.Field2 = F.Field2
        GROUP BY Foo1.Field1, Foo1.Field2
        HAVING Count(Foo1.ID) > 1))
      AND Foo1.ID NOT IN

        -- List 2 - one row from each set of duplicate
        (SELECT Min(ID)
        FROM Foo1 AS F
        WHERE Exists (SELECT Field1, Field2, Count(ID)
          FROM Foo1
          WHERE           Foo1.Field1 = F.Field1
           AND Foo1.Field2 = F.Field2
          GROUP BY Foo1.Field1, Foo1.Field2
          HAVING Count(Foo1.ID) > 1)
        GROUP BY Field1, Field2);

Since this may appear complicated, let me explain. My strategy here is to return two lists: The first, List 1, is a list of all rows that have duplicates, and the second, List 2, is a list of one row from each set of duplicates. This query simply deletes all rows that are in List 1 but not in List 2

This FAQ was taken from a previous SQL Guru question...

Alert 4Guys reader Paul Davallou wrote in to share another way...

Another way to remove all duplicate records in a database table save one is to use the following approach:

1.) Capture one instance of the unique rows using a SELECT DISTINCT ..., dumping the results into a temp table.

2.) Delete all of the rows from the original table

3.) Insert the rows from the temp table back into the original table.

And there you have it!


FAQ posted by Scott Mitchell at 2/8/2002 12:52:37 PM to the Databases, Queries category. This FAQ has been viewed 63,368 times.

Do you have a FAQ you'd like to suggest? Suggestions? Comments? If so, send it in! Also, if you'd like to be a FAQ Admin (creating/editing FAQs), let me know! If you are looking for other FAQs, be sure to check out the 4Guys FAQ and Commonly Asked Messageboard Questions!

Most Viewed FAQs:

1.) How can I format numbers and date/times using ASP.NET? For example, I want to format a number as a currency. (761643 views)
2.) I am using Access and getting a 80004005 error (or a [Microsoft][ODBC Microsoft Access Driver] The Microsoft Jet database engine cannot open the file '(unknown)' error) when trying to open a connection! How can I fix this problem? (207777 views)
3.) How can I convert a Recordset into an array? Also, how can I convert an array into a Recordset? (202549 views)
4.) How can I quickly sort a VBScript array? (196039 views)
5.) How can I find out if a record already exists in a database? If it doesn't, I want to add it. (156019 views)
6.) How do I display data on a web page using arrays instead of Do...While...MoveNext...???... (152331 views)
7.) When I get a list of all files in a directory via the FileSystemObject, they aren't ordered in any reasonable way. How can I sort the files by name? Or by size? Or by date created? Or... (140381 views)
8.) For session variables to work, must the Web visitor have cookies enabled? (110162 views)
9.) Can I send emails without using CDONTS? (107083 views)
10.) How can I take the result of a SELECT...MULTIPLE or a group of same-named checkboxes and turn it into a query? That is, if the user selects 3 answers, how can I construct a query that looks for all 3? (106308 views)
Last computed at 9/17/2007 3:22:00 AM


ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article