Support Forums - View Thread

View Thread

The following is the text of the current message along with any replies.

Messages 1 to 10 of 15 total

Fri, Apr 2 2010 12:23 PM	Permanent Link
Hershcu Sorin	Hello It's not a specific ebd question but since I have to do it with edb I try here for a clever solution. I have a table with Field1, Field2 Some records have a equal values on the first 2 fields. Rec1.Field1 = Rec2.Field1; Rec1.Field2 = Rec2.Field2; I want to remain only with the first record and delete the others. I know that I can iterate on the records, one by one, looking for equals record and delete them but since it's a very big table 500,000 records it take hours. Any faster solution will be appreciated Thanks Sorin
Sat, Apr 3 2010 4:44 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Sorin >I have a table with Field1, Field2 >Some records have a equal values on the first 2 fields. >Rec1.Field1 = Rec2.Field1; >Rec1.Field2 = Rec2.Field2; > >I want to remain only with the first record and delete the others. >I know that I can iterate on the records, one by one, looking for equals >record and delete them >but since it's a very big table 500,000 records it take hours. It should be doable, the big problem is making sure you keep the first record and delete the second. The core SQL is something like SELECT id FROM table GROUP BY Field1, Field2 HAVING COUNT() > 1 That will give you a list if ids where there are duplicates (or triplicates etc) and it will probably be the last one. However, subject to correction by Tim, you can't guarantee this. The SQL can then be extended to DELETE FROM table WHERE id IN ( SELECT id FROM table GROUP BY Field1, Field2 HAVING COUNT() > 1 ) which will probably do what you want for duplicates, but if there are any triplicates or better you would need to run the code several times. Roy Lambert [Team Elevate]
Sat, Apr 3 2010 5:14 AM	Permanent Link
Malcolm Taylor	Roy Lambert wrote: > > DELETE FROM table WHERE id IN > ( > SELECT id FROM table > GROUP BY Field1, Field2 > HAVING COUNT() > 1 > ) > > which will probably do what you want for duplicates, but if there > are any triplicates or better you would need to run the code > several times. > > Roy Lambert [Team Elevate] Roy, are you sure that will not delete all* the records with matching f1 and f2 .. the original and any number of 'matching' records? --
Sat, Apr 3 2010 7:47 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Malcolm >Roy, are you sure that will not delete all the records with >matching f1 and f2 .. the original and any number of 'matching' >records? Pretty certain. The GROUP BY will mean that only one row is returned for each case where there are two or more rows that have the same f1 & f2. The way to check is to see what the sub query returns and cross check with the actual data. Roy Lambert [Team Elevate]
Sat, Apr 3 2010 8:55 AM	Permanent Link
Michael Baytalsky	Just a quick note: the SQL below will not work on 99% of databases. I know it will probably work on DBISAM, not tested with EDB. This will return ALL pairs with one of their id - probably first one. On 4/3/2010 11:44 AM, Roy Lambert wrote: > SELECT id FROM table > GROUP BY Field1, Field2 > HAVING COUNT()> 1 Instead you should write, which will work on most databases: SELECT Min(id) FROM table GROUP BY Field1, Field2 HAVING COUNT() > 1 The fastest way still is to sort this table by Field1, Field2 and iterate it deleting duplicate ocurrences. You will not need to search for each one, because you just remember your current pair and compare to next until it does not match. If you do this in stored proc that will be the fastest solution one could imagine for any database. -- Regards, Michael Baytalsky Context Software LLC. http://www.contextsoft.com the makers of Context Database Designer
Sat, Apr 3 2010 11:25 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Michael Or in this case since xx wants to delete the second of the matching pairs DELETE FROM table WHERE id IN ( SELECT Max(id) FROM table GROUP BY GROUP BY Field1, Field2 HAVING COUNT(*)>1 ) Thanks for that little tip - very useful. >The fastest way still is to sort this table by Field1, Field2 and iterate >it deleting duplicate ocurrences. You will not need to search for each one, >because you just remember your current pair and compare to next until >it does not match. If you do this in stored proc that will be the fastest >solution one could imagine for any database. Without timing it I'll have to take your word about speed. However, if you're right, I wouldn't expect much difference just using Delphi with no controls attached to the table. Roy Lambert
Sun, Apr 4 2010 3:26 AM	Permanent Link
Hershcu Sorin	Thanks Roy You are right. It delete the last occurrence and and needs to run the procedure again for cases with more than double occurrences. Thanks Sorin "Roy Lambert" <roy.lambert@skynet.co.uk> wrote in message >>Roy, are you sure that will not delete all the records with >>matching f1 and f2 .. the original and any number of 'matching' >>records? > > Pretty certain. The GROUP BY will mean that only one row is returned for > each case where there are two or more rows that have the same f1 & f2. > > The way to check is to see what the sub query returns and cross check with > the actual data.
Sun, Apr 4 2010 4:33 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Sorin Two points: 1. don't forget that useful tip from Michael Baytalsky using max(id) 2. You can determine the number of iterations you need by using another query SELECT MAX(DupNos)-1 AS Iterations FROM (SELECT COUNT() AS DupNosFROM table GROUP BY GROUP BY Field1, Field2 HAVING COUNT()>1) AS T1 If you have many triplicates and quadruplicates etc it may be faster as Michael suggests just to run through the table in Delphi. Just make sure you have no controls apart from the table component connected. Roy Lambert [Team Elevate]
Sun, Apr 4 2010 4:47 AM	Permanent Link
Hershcu Sorin	"Roy Lambert" <roy.lambert@skynet.co.uk> wrote in message > Two points: > > 1. don't forget that useful tip from Michael Baytalsky using max(id) > 2. You can determine the number of iterations you need by using another > query > > SELECT MAX(DupNos)-1 AS Iterations FROM > (SELECT COUNT() AS DupNosFROM table > GROUP BY GROUP BY Field1, Field2 > HAVING COUNT()>1) AS T1 > > If you have many triplicates and quadruplicates etc it may be faster as > Michael suggests just to run through the table in Delphi. Just make sure > you have no controls apart from the table component connected. Yes the iteration on Delphi table is faster. Thanks All Sorin
Mon, Apr 5 2010 3:25 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, << Without timing it I'll have to take your word about speed. However, if you're right, I wouldn't expect much difference just using Delphi with no controls attached to the table. >> There's still a lot more going on in a TDataSet than there is in a raw EDB SQL cursor. The EDB SQL cursors are bare-bones and are very, very fast. -- Tim Young Elevate Software www.elevatesoft.com

	Page 1 of 2	Next Page »
Jump to Page: 1 2