Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » ElevateDB Technical Support » Support Forums » ElevateDB SQL » View Thread |
Messages 1 to 10 of 12 total |
Spotting duplicates |
Tue, Mar 9 2010 11:32 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | I do hope John Hay reads the ElevateDB ngs as well as the DBISAM ones.
I have four tables Contacts, Career, Companies, Sites Contacts is a one to many with Career Companies is a one to many with Sites Career is one to one with Sites and one to many with Companies With some data loading from external DBs I now have multiple occurrences of the same person (sites as well but I've sussed that and companies can wait). I can almost get the SQL to identify duplicate contacts BUT I can only get one of them. My logic is if two people with the same forename and same surname exist in the same site of a company its a good guess they are the same person so SELECT _fkContacts, _fkCompanies, _fkSites, _Forename, _Surname FROM Career JOIN Contacts C ON _fkContacts = C._ID GROUP BY _Surname,_Forename,_fkCompanies,_fkSites HAVING COUNT(_Surname + _Forename + CAST(_fkCompanies AS VARCHAR(16)) + CAST(_fkSites AS VARCHAR(16))) > 1 The top one on the list is Anthony Allison - ID 1008199. What I need is the other ID (1000095) as well. Any ideas? Roy Lambert |
Tue, Mar 9 2010 1:19 PM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | I do enjoy feeling like a moron
SELECT _ID FROM Contacts WHERE _ID IN (SELECT..... Roy Lambert |
Tue, Mar 9 2010 1:24 PM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | I now feel an even bigger buffoon. If the original query doesn't produce the full list adding the extra bit won't help.
Roy Lambert |
Tue, Mar 9 2010 10:00 PM | Permanent Link |
Raul Team Elevate | Roy,
I think it's the group by that's limiting your output. How about starting with getting duplicate entries from one table first ? Something like this : select C1._fkContacts,C1._Surname,C1._Forename,C2._fkContacts as _fkContacts2, C2._Surname,C2._Forename from Career C1,Career C2 where c1._fkContacts<c2._fkContacts and c1._Surname=c2._Surname and c1._Forename=c2._Forename and C1._fkCompanies = C2._fkCompanies and C1._fkSites=C2._fkSites Then you should be able to use one of the _fk fields to join to other table. Raul |
Wed, Mar 10 2010 3:01 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Raul
>I think it's the group by that's limiting your output. Me to. >How about starting with getting duplicate entries from one table first ? I'd love to, unfortunately 1) _Forename, _Surname are in Contacts, _fkCompanies, _fkSites are in Career so I have to use two tables and 2) Career contains c20k records (Contacts a bit less) and a cartesian join on that takes forever Thanks for the suggestion though, The C1._fkContacts < C2._fkContacts is something I never thought of but feels as if it should help. I did come up with a solution but it doesn't work (I suspect a bug) Roy Lambert |
Wed, Mar 10 2010 3:16 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Raul
Altering the SQL to select _Surname, _Forename FROM Contacts C1, Contacts C2 where c1._ID<c2._ID and c1._Surname=c2._Surname and c1._Forename=c2._Forename just to try and Forever = 1274.887 secs Roy Lambert |
Wed, Mar 10 2010 9:07 AM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Roy,
<< I did come up with a solution but it doesn't work (I suspect a bug) >> What is it that you suspect is a bug ? -- Tim Young Elevate Software www.elevatesoft.com |
Wed, Mar 10 2010 9:08 AM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Roy,
<< just to try and Forever = 1274.887 secs >> http://www.elevatesoft.com/supportfaq?action=view&category=edb&question=joins_where_clause -- Tim Young Elevate Software www.elevatesoft.com |
Wed, Mar 10 2010 10:12 AM | Permanent Link |
Raul Team Elevate | Roy,
Hopefully you get your solution working (bug and all) but question : is this a one time operation or do you need to do this regularly. If one time then creating a temp intermediary tables (do all joins so you're down to single table and then eliminate non-duplicates using your group by logic) might be faster as you'd have all the relevant IDs at the end that just need to be merged. Raul |
Wed, Mar 10 2010 11:11 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
I started a new thread for it: Is this In comparison a bug Roy Lambert |
Page 1 of 2 | Next Page » | |
Jump to Page: 1 2 |
This web page was last updated on Saturday, April 27, 2024 at 08:52 PM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |