Is it possible to delete duplicate rows in a table without using a temporary table (i.e., just do it with a single SQL statement)?



All you need to do is compare the table to itself to find out which candidates are duplicates. Do this by assigning aliases to the table so you can use it twice, once as A and again as B, like this:

delete

from jobs

where

job_desc in

(

select

a.job_desc

from

jobs a,

jobs b

where

a.job_desc = b.job_desc

group by

a.job_desc

having

count(a.job_desc) >1

)

When you do this you’ll get a count based on the column value you think is duplicated. I used “desc” because the IDs will be different, so the description is the thing that is the candidate for repetition. Join the table to itself on that candidate to find matches of it. Everything will match to itself at least once that’s why you group by the thing you think is a duplicate. Applying the HAVING clause to it squeezes out all the “ones” or singletons, leaving only the rows that have counts that are more than one in other words, your duplicate rows.

By the way, this code trashes all the records that are duplicates. If you want to save one, add a comparison for the IDs to be different in the WHERE clause.



Leave a Reply