Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 10 total
Thread Listing Duplicate Record Rows
Wed, Dec 2 2009 12:38 PMPermanent Link

Lance Ras
I'm having an SQL brain flatuation.   

I need to do a query that gives me either directly or via a couple queries using a temporary table, that will contain a list of possible duplicate people that
includes all rows with the duplicates.

For example.

ID, FName, MName, LName, Vintage
1 | John | W | Doe | JR
2 | Mary | A | Doe |
3 | Fred |    | Marn |
4 | Mary | A | Doe |
5 | Jeff | T | Maynard |
6 | John | W | Doe | JR


I would like a query result that would either be:
ID, FName, MName, LName, Vintage, NumberOfDupes
1 | John | W | Doe | JR | 2
6 | John | W | Doe | JR | 2
2 | Mary | A | Doe |     | 2
4 | Mary | A | Doe |     | 2

or just:

ID, FName, MName, LName, Vintage
1 | John | W | Doe | JR
6 | John | W | Doe | JR
2 | Mary | A | Doe |     
4 | Mary | A | Doe |     


Queries I've tried seem to only get me the below, which doesn't tell me the other rows that are duped, just how many:
ID, FName, MName, LName, Vintage, NumberOfDupes
1 | John | W | Doe | JR | 2
2 | Mary | A | Doe |     | 2


Thanks in advance of any assistance.

Lance
Wed, Dec 2 2009 2:36 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Lance


Not using your fields cos that way I can test against my data

select
_Forename,
_Surname,
Count(*) AS Qty
FROM Contacts
GROUP BY _Surname,_Forename  
HAVING  Qty > 1

Roy Lambert [Team Elevate]
Wed, Dec 2 2009 3:05 PMPermanent Link

Lance Ras
Roy,

That gave me the list of uniques that had duplication, but didn't include each row of all duplicated records.

I.E.

ID, FName, MName, LName, Vintage, NumberOfDupes
1 | John | W | Doe | JR | 2
2 | Mary | A | Doe |     | 2

when I need to see:

ID, FName, MName, LName, Vintage, NumberOfDupes
1 | John | W | Doe | JR | 2
6 | John | W | Doe | JR | 2
2 | Mary | A | Doe |     | 2
4 | Mary | A | Doe |     | 2


I think I found a query that will work though:

SELECT
p.ID, p.FName, p.MName, p.LName, P.Vintage, s.FirstName, s.MiddleName, s.LastName, s.Vintage, s.Duplicates
FROM
PEOPLE p
INNER JOIN
(
SELECT
P2.FName as FName,
p2.MName as MName,
p2.LName as LName,
P2.Vintage as Vintage,
COUNT (*) as Duplicates
FROM
PEOPLE p2
GROUP BY
P2.FName,
p2.MName,
p2.LName,
P2.Vintage
HAVING Duplicates > 1
) S
ON
(s.FName = p.FName AND s.LName = p.LName AND s.MName=p.MName AND s.Vintage=p.Vintage)
WHERE
s.Duplicates > 1
ORDER BY
p.LName,
P.Vintage,
P.FName,
P.MName






Roy Lambert wrote:

Lance


Not using your fields cos that way I can test against my data

select
_Forename,
_Surname,
Count(*) AS Qty
FROM Contacts
GROUP BY _Surname,_Forename  
HAVING  Qty > 1

Roy Lambert [Team Elevate]
Thu, Dec 3 2009 3:12 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Lance


Sorry, I missed the point that you wanted all rows. Glad you sorted it.

Roy Lambert
Thu, Dec 3 2009 9:35 PMPermanent Link

Richard Harding
Lance / Tim

I thought the first query below should give these results.

ID, FName, MName, LName
1 | John | W | Doe | JR
6 | John | W | Doe | JR
2 | Mary | A | Doe |     
4 | Mary | A | Doe |     


SELECT ID, FName, MName, LName FROM Contacts C1
  WHERE  C1.FName ||C1.MName || C1.LName IN
     (SELECT C2.FName ||C2.MName || C2.LName
      FROM Contacts C2
      GROUP BY  FName, MName, LName
      HAVING COUNT(*) > 1)

However, it gives the result:

ID, FName, MName, LName
2 | Mary | A | Doe |     
4 | Mary | A | Doe |   

Add the word DISTINCT and the correct results appear.

SELECT DISTINCT ID, FName, MName, LName FROM Contacts C1
  WHERE  C1.FName ||C1.MName || C1.LName IN
     (SELECT C2.FName ||C2.MName || C2.LName
      FROM Contacts C2
      GROUP BY  FName, MName, LName
      HAVING COUNT(*) > 1)

Does this make any sense?

Richard Harding
Thu, Dec 3 2009 10:22 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Richard,

<< Does this make any sense? >>

Not really.  Send me the database catalog and table files, and I'll take a
look.  I usually break these things down into their constituent parts first
to see what the IN is looking at, and then proceed from there.

--
Tim Young
Elevate Software
www.elevatesoft.com

Fri, Dec 4 2009 3:58 PMPermanent Link

Richard Harding
Time,
>>Send me the database catalog and table files, and I'll take a
look.  I usually break these things down into their constituent parts first
to see what the IN is looking at, and then proceed from there.

I have posted the database backup file to the binaries group.

SELECT C2.FName ||C2.MName || C2.LName
     FROM Contacts C2
     GROUP BY  FName, MName, LName
     HAVING COUNT(*) > 1

gives the expected results.

JohnWDoe
MaryADoe    

Richard Harding
Sun, Dec 6 2009 7:42 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Richard,

<< I have posted the database backup file to the binaries group. >>

The DISTINCT version is what you need to remove the duplicates, and the SQL
is giving the correct results without it due to the fact that there are
multiple rows with the same FName, MName, LName combo in the Contacts table
and IN won't cause them to be removed.

--
Tim Young
Elevate Software
www.elevatesoft.com
Tue, Dec 8 2009 12:57 AMPermanent Link

Richard Harding
Hi Tim,

>>>The DISTINCT version is what you need to remove the duplicates, and the SQL is giving
the correct results without it due to the fact that there are multiple rows with the same
FName, MName, LName combo in the Contacts table  and IN won't cause them to be removed.

I still do not get it.

The original file looks like this:

ID, FName, MName, LName
1 | John | W | Doe
2 | Mary | A | Doe
3 | Fred |    | Marn
4 | Mary | A | Doe
5 | Jeff | T | Maynard
6 | John | W | Doe

The SELECT that the IN clause refers to is:

SELECT C2.FName ||C2.MName || C2.LName
     FROM Contacts C2
     GROUP BY  FName, MName, LName
     HAVING COUNT(*) > 1

which returns:

JohnWDoe
MaryADoe

So far everything is OK.


----------------------------

Now run the following query.

SELECT ID, FName, MName, LName FROM Contacts C1
 WHERE  C1.FName ||C1.MName || C1.LName IN
    (SELECT C2.FName ||C2.MName || C2.LName
     FROM Contacts C2
     GROUP BY  FName, MName, LName
     HAVING COUNT(*) > 1)

It gives the results:

ID, FName, MName, LName
2 | Mary | A | Doe |     
4 | Mary | A | Doe |   

It incorrectly ignores the rows for John W Doe despite the fact that "JohnWDoe" is in
result set for the sub-query.

----------------------------------------

However, if you add DISTINCT we get 2 more rows.  (The DISTINCT should remove duplicate
rows in the result set so it should not be possible to get additional rows by adding
DISTINCT.)


SELECT DISTINCT ID, FName, MName, LName FROM Contacts C1
 WHERE  C1.FName ||C1.MName || C1.LName IN
    (SELECT C2.FName ||C2.MName || C2.LName
     FROM Contacts C2
     GROUP BY  FName, MName, LName
     HAVING COUNT(*) > 1)

ID, FName, MName, LName
1 | John | W | Doe
6 | John | W | Doe
2 | Mary | A | Doe
4 | Mary | A | Doe

This is the correct results.  I think we should get the same result set without the DISTINCT.


----------------------------------------------------------------------------

Note that the query
SELECT  ID, FName, MName, LName FROM Contacts C1
 WHERE   C1.FName ||C1.MName || C1.LName IN
    ('JohnWDoe', 'MaryADoe')

gives the correct results.

ID, FName, MName, LName
1 | John | W | Doe
6 | John | W | Doe
2 | Mary | A | Doe
4 | Mary | A | Doe

-----------------------------------------------------------------------------

I have tested this in Interbase & Access.  They both give the same 4 rows as above with
and without DISTINCT.

Richard Harding
Tue, Dec 8 2009 4:12 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Richard,

<< I still do not get it. >>

I don't either.   Now that I look at it again, I'm getting the same results
as you, so I'm not sure what I was seeing before.  Something is wonky with
the COUNT(*) or the IN, but I'm not sure what it is yet.   I'll let you know
as soon as I figure out what is going on.

--
Tim Young
Elevate Software
www.elevatesoft.com
Image