Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 13 total
Thread de-duping a file
Mon, Sep 8 2014 10:24 AMPermanent Link

Ben Sprei

CustomEDP

Is there a simple way in sql to present records that have the same value in
one or 2 fields

Ben Sprei
Mon, Sep 8 2014 10:50 AMPermanent Link

Raul

Team Elevate Team Elevate

On 9/8/2014 10:24 AM, Ben wrote:
> Is there a simple way in sql to present records that have the same value
> in one or 2 fields

Simplest is probably just joining to itself with the fields you want as
compare equal condition

Raul
Mon, Sep 8 2014 3:07 PMPermanent Link

Ben Sprei

CustomEDP

Any Example please
Ben Sprei

"Raul"  wrote in message
news:1D806B46-C991-4169-A0F5-A6F696429EB8@news.elevatesoft.com...

On 9/8/2014 10:24 AM, Ben wrote:
> Is there a simple way in sql to present records that have the same value
> in one or 2 fields

Simplest is probably just joining to itself with the fields you want as
compare equal condition

Raul
Mon, Sep 8 2014 3:34 PMPermanent Link

Raul

Team Elevate Team Elevate

On 9/8/2014 3:07 PM, Ben wrote:
> Any Example please

OK - here's one that's made up but should work.

Assume table of users with id, first and last name

Something like this will give you a basic list of combination of duplicates:

select A.*,B.* from users A,users  B
where A.first=b.first AND A.last=B.last AND A.id<B.id


Raul
Mon, Sep 8 2014 10:04 PMPermanent Link

Ben Sprei

CustomEDP

I tried running this code and I also tried it with a left Join in EDB
Manager.  We are talking about a file with approx 10,000 records.
After half an hour I aborted it.  This does not seem like a solution.

Ben

"Raul"  wrote in message
news:1339460C-0334-4A0F-A20E-59CC68A8A174@news.elevatesoft.com...

On 9/8/2014 3:07 PM, Ben wrote:
> Any Example please

OK - here's one that's made up but should work.

Assume table of users with id, first and last name

Something like this will give you a basic list of combination of duplicates:

select A.*,B.* from users A,users  B
where A.first=b.first AND A.last=B.last AND A.id<B.id


Raul
Tue, Sep 9 2014 4:23 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Ben


Something along the lines of

SELECT * FROM Contacts A
WHERE COALESCE(_Surname,'')+COALESCE(_Forename,'') IN (SELECT DISTINCT COALESCE(X._Surname,'')+COALESCE(X._Forename,'') FROM Contacts X GROUP BY COALESCE(X._Surname,'')+COALESCE(X._Forename,'') HAVING COUNT(*) > 1)
ORDER BY COALESCE(_Surname,'')+COALESCE(_Forename,'')

Should work. It does here on an 11k row table

The problem with Raul's code is the nature of the union it creates (I can never remember what its called but I think its cartesian) which means that you end up with 10k x 10k rows

Roy Lambert
Wed, Sep 10 2014 1:49 PMPermanent Link

Ben Sprei

CustomEDP

Thanks this works.

"Roy Lambert"  wrote in message
news:49488A25-D491-4BDC-90EF-1469EDEFFEEA@news.elevatesoft.com...

Ben


Something along the lines of

SELECT * FROM Contacts A
WHERE COALESCE(_Surname,'')+COALESCE(_Forename,'') IN (SELECT DISTINCT
COALESCE(X._Surname,'')+COALESCE(X._Forename,'') FROM Contacts X GROUP BY
COALESCE(X._Surname,'')+COALESCE(X._Forename,'') HAVING COUNT(*) > 1)
ORDER BY COALESCE(_Surname,'')+COALESCE(_Forename,'')

Should work. It does here on an 11k row table

The problem with Raul's code is the nature of the union it creates (I can
never remember what its called but I think its cartesian) which means that
you end up with 10k x 10k rows

Roy Lambert
Thu, Sep 11 2014 2:57 PMPermanent Link

Ben Sprei

CustomEDP

Now comes the finale.  I need to delete all duplicate records EXCEPT for
one - the first or the last.
How do I do it.

Ben

"Ben"  wrote in message
news:31DFFEC3-6866-4B61-9E75-F9B213F117E5@news.elevatesoft.com...

Thanks this works.

"Roy Lambert"  wrote in message
news:49488A25-D491-4BDC-90EF-1469EDEFFEEA@news.elevatesoft.com...

Ben


Something along the lines of

SELECT * FROM Contacts A
WHERE COALESCE(_Surname,'')+COALESCE(_Forename,'') IN (SELECT DISTINCT
COALESCE(X._Surname,'')+COALESCE(X._Forename,'') FROM Contacts X GROUP BY
COALESCE(X._Surname,'')+COALESCE(X._Forename,'') HAVING COUNT(*) > 1)
ORDER BY COALESCE(_Surname,'')+COALESCE(_Forename,'')

Should work. It does here on an 11k row table

The problem with Raul's code is the nature of the union it creates (I can
never remember what its called but I think its cartesian) which means that
you end up with 10k x 10k rows

Roy Lambert
Thu, Sep 11 2014 3:10 PMPermanent Link

Aage J.

Den 11.09.2014 20:57, skrev Ben:
> Now comes the finale.  I need to delete all duplicate records EXCEPT for
> one - the first or the last.
> How do I do it.
>
> Ben
>


Off the top of my head, and totally untested.  Something along the lines of:

delete from THETABLE T1
where exists (select *
              from THETABLE T2
              where T2.SOMEFIELD = T1.SOMEFIELD
               and  T2.PK > T1.PK)

The PK is the PrimaryKey (or something unique).  I hope this leaves all
the unique records (with the lowest PK).
Handle with care!

--
Aage J.
Fri, Sep 12 2014 4:44 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Ben

>Now comes the finale. I need to delete all duplicate records EXCEPT for
>one - the first or the last.

I don't know if Aage's suggestion will work or not, but before I have a look at it I need a question answered - what do you mean by first or last. Is it a random choice - could be either at the program's choice or is the user allowed to specify which is to be left, or should both the first and last duplicates be left or something entirely different?

Roy Lambert
Page 1 of 2Next Page »
Jump to Page:  1 2
Image