Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 7 of 7 total
Thread Depuplicating - Calling SQL Guru
Thu, May 17 2012 2:33 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

I'm running the SQL below and I'd like to remove duplicates of _fkContacts. Currently I'm loading it into a stringgrid and deleting as loading.

I can do this easily in SQL by adding a GROUP BY _fkContacts. Unfortunately it retains the first entry and I need the last (ie the most recent). I can't think of any way of doing it so I'm hoping one of you lot can.

 SELECT
_ID,
_fkContacts,
_fkCompanies,
_fkSites,
CAST(RCF(_JobTitle,RCF(W._Surname,W._Forename,' '),' - ') AS VARCHAR(85)) AS Contact,
CAST(RCF(RCF(Site._PostCode,RCF(Site._County,RCF(Site._Town,RCF(Site._Address2,Site._Address1,', '),' ,'),', '),', '),_Name, ': ') AS VARCHAR(255)) AS Company,
CAST(RCF('T/O: ' +  T._Turnover,'Staff: ' + S._Staff,', ') AS VARCHAR(75)) AS Size,
CAST(RCF(IFNULL(M._MktDesc,IFNULL(SC._Description,SG._Description,SC._Description),M._MktDesc),O._Description,', ') AS VARCHAR(102)) AS CompanyType,
IFNULL(_Left,99999,_left) AS SortLeft
FROM Career
JOIN Contacts W ON _fkContacts = W._ID
JOIN Sites Site ON _fkCompanies = Site._fkCompanies AND _fkSites = Site._SiteID
JOIN Companies C ON _fkCompanies = C._ID
LEFT OUTER JOIN SandT S ON C._fkSandT_Staffing = S._Band
LEFT OUTER JOIN SandT T ON C._fkSandT_Turnover = T._Band
LEFT OUTER JOIN Markets M ON C._fkMarkets = M._ID
LEFT OUTER JOIN OrgType O ON C._fkOrgType = O._ID
LEFT OUTER JOIN SICGroup SG ON C._fkSICGroup = SG._ID
LEFT OUTER JOIN SICCode SC ON C._fkSICCode = SC._ID
WHERE
W._LatestCV IS NOT NULL
ORDER BY _Surname, _Forename, SortLeft DESC, _fkCompanies


Roy Lambert
Fri, May 18 2012 9:39 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

I had a brilliant idea .... but it didn't work


I added

AND (_Left IS NULL OR _Left = (SELECT MAX(_LEFT) FROM Career X WHERE X._fkContacts = _fkContacts))

Unfortunately testing it the subselect always returns 2012, and testing further shows its selecting all the rows selected by W._LatestCV IS NOT NULL

Help is needed Frown

Roy Lambert
Fri, May 18 2012 10:04 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Sused it


AND (_Left IS NULL OR _Left = (SELECT MAX(_LEFT) FROM Career X WHERE X._fkContacts = Career._fkContacts))


Roy Lambert
Mon, May 21 2012 9:41 AMPermanent Link

Adam Brett

Orixa Systems

Roy

Not sure I understand your problem but if I do:

I would have started by defining a NewView which returned IDs of most recent entered row for each customer, without all the joins etc. and then written:

SELECT
 <big long statement with complicated joins>

WHERE ID IN (

SELECT ID FROM <NewView>
)

In this way I could test the 2 processes (one returning the fields I want, the other returning the rows) separately, which is less confusing.

Also, the IDs in NewView might well be useful in other parts of the application.

Adam
Mon, May 21 2012 10:38 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam


>Not sure I understand your problem but if I do:
>
>I would have started by defining a NewView which returned IDs of most recent entered row for each customer, without all the joins etc. and then written:

Unless I'm missing something this would simply move the problem into NewView, and possibly make it worse since (unless I'm missing something) views can't be parameterised.

>
>SELECT
> <big long statement with complicated joins>

<bragging> its not very big - you should see some of my other ones </bragging>
>
>WHERE ID IN (
>
>SELECT ID FROM <NewView>
>)
>
>In this way I could test the 2 processes (one returning the fields I want, the other returning the rows) separately, which is less confusing.

But, assuming I can get it to work, wouldn't it be a fair bit slower?

>Also, the IDs in NewView might well be useful in other parts of the application.

Its possible, but off the top of my head, I can't think of anywhere.

If you want I can describe in mind numbingly boring detail what its being used for Smiley

Roy Lambert
Tue, May 22 2012 3:01 AMPermanent Link

Adam Brett

Orixa Systems

Roy

My "Help" my be totally off course ... but here's another try, just so you feel someone else is trying to help.

--

I think I have a similar problem to yours. I have a bunch of Cocoa co-ops who deliver cocoa to depots. I need to get a dataset which includes the "most recently purchases delivery" from each co-op. If I just use a MAX(Date) grouping I do get the max date, but the other columns are just the first one the SQL comes to.

I fooled around with this a lot & the solution I have is not fast.

First I have a FUNCTION:

CREATE FUNCTION "P_MaxDate" (IN SocietiesID INTEGER)
RETURNS DATE
BEGIN
 DECLARE ResultStmt STATEMENT;
 DECLARE Result DATE;

 PREPARE ResultStmt FROM
 'SELECT MAX(DateDone) INTO ? FROM PrimaryEvacuations WHERE SocietiesID = ?';
 EXECUTE ResultStmt USING Result, SocietiesID;

 RETURN Result;
END

Then I can write SQL:

SELECT
 SocietiesID,
 BagsSent

FROM PrimaryEvacuations
WHERE DateDone = P_MaxDate(SocietiesID)


--

This is slow to execute as the FUNCTION is called on every request of the WHERE. There is probably a better way to do it, but speed is not important in this case, as it run infrequently.

Adam
Tue, May 22 2012 6:37 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam


I'm always grateful for any help, even when its way off the mark, because it may either actually solve my problem, or as has happened previously, sparked a thought in my feeble brain which does solve the problem, or start me down a totally different track which prevents the existence of the problem in the first place - a bit like parallel universes or as they'd say on Discworld "its quantum" Smiley

This particular problem is one which seems to come up every now and then. The solution I've implemented works in ElevateDB but wouldn't have worked in DBISAM (since it dodn't have sub selects) and is dependent on the fact that I have a date column in there which I can use.

My current approach is pretty fast and returns just the rows wanted. Wether its faster than your function I can't say since to my simple way of thinking they are pretty much doing the same.

If you want to give it a try

SELECT
SocietiesID,
BagsSent
FROM PrimaryEvacuations
WHERE DateDone = P_MaxDate(SocietiesID)
DateDone = (SELECT MAX(DateDone) FROM PrimaryEvacuations X WHERE X.SocietiesID = PrimaryEvacuations.SocietiesID)


should work (I think). I'd be interested in comparative timings.

Roy Lambert
Image