Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 8 of 8 total
Thread Joins, Group By and Indexes
Wed, Oct 3 2012 4:07 PMPermanent Link

David

Hello.

I have had an issue with joining 2 tables, first table contains 1 ID that matches multiple records in the joined table.  So as expected I have lots of rows due to the join.  I order the table so that it is based on decending date order from a date field in the second table and then do a group by, I was hoping this would result in the last record for each unique group.

What is actually happening is that the first record that is a match from the joined table is being returned.  I tried putting Max(ActionDate) in the join, but I still could not get it to work and finally figured out this was due to the indexing of the table.

Since then I have just put the result set into a memory table ordered in descending date order, then do an SQL optimize on the memory table to re-index in the new order, and finally run a group by on the memory table.  This allows me get the first or last record within a group by simply changing the date sort to Ascending or Decending before optimizing and then doing the group by.

Probably not the most elegent solution, but it works.  Is there a problem with doing things this way, and also, does anyone have any idea how I would resolve it without having to re-optimize each time.

Cheers
David.
Wed, Oct 3 2012 4:23 PMPermanent Link

Jose Eduardo Helminsky

HPro Informatica

David

Could you post the SQL and/or an example code ? It will help us to provide
support or comment things better.
And if possible a little sample of data.

Eduardo

Wed, Oct 3 2012 4:24 PMPermanent Link

Raul

Team Elevate Team Elevate

That is correct behavior of DBISAM - sort is generally applied at the
very end so all the joining and grouping takes place first and then your
results is sorted.

When using Group By you need to use the aggregate functions and in this
case "max" is what you want so you are on the right track.

I don't know what your tables look like but assuming you have 2 tables
like this :

master record
- masterid
- label

detail records table
- masterid
- ActionDate

you could use something like this

select M.masterid, max(D.ActionDate) from
master M join detail D on M.masterid=D.masterid
GROUP BY M.masterid
order by M.masterid

Raul



On 10/3/2012 4:07 PM, David wrote:
> Hello.
>
> I have had an issue with joining 2 tables, first table contains 1 ID that matches multiple records in the joined table.  So as expected I have lots of rows due to the join.  I order the table so that it is based on decending date order from a date field in the second table and then do a group by, I was hoping this would result in the last record for each unique group.
>
> What is actually happening is that the first record that is a match from the joined table is being returned.  I tried putting Max(ActionDate) in the join, but I still could not get it to work and finally figured out this was due to the indexing of the table.
>
> Since then I have just put the result set into a memory table ordered in descending date order, then do an SQL optimize on the memory table to re-index in the new order, and finally run a group by on the memory table.  This allows me get the first or last record within a group by simply changing the date sort to Ascending or Decending before optimizing and then doing the group by.
>
> Probably not the most elegent solution, but it works.  Is there a problem with doing things this way, and also, does anyone have any idea how I would resolve it without having to re-optimize each time.
>
> Cheers
> David.
>
Wed, Oct 3 2012 4:59 PMPermanent Link

David

Hi Raul.

Thanks for the fast reply.  I dont have the code to hand, had enough for the day Smile what I was finding that by using the Max(ActionDate) in the SQL, it looked like it was working but only ActionDate had the correct data in it, the rest of the fields were based on the first match in the table.

My solution works, do you see any problem with it, if rather cumbersome.

Cheers
David.

Raul wrote:

That is correct behavior of DBISAM - sort is generally applied at the
very end so all the joining and grouping takes place first and then your
results is sorted.

When using Group By you need to use the aggregate functions and in this
case "max" is what you want so you are on the right track.

I don't know what your tables look like but assuming you have 2 tables
like this :

master record
- masterid
- label

detail records table
- masterid
- ActionDate

you could use something like this

select M.masterid, max(D.ActionDate) from
master M join detail D on M.masterid=D.masterid
GROUP BY M.masterid
order by M.masterid

Raul



On 10/3/2012 4:07 PM, David wrote:
> Hello.
>
> I have had an issue with joining 2 tables, first table contains 1 ID that matches multiple records in the joined table.  So as expected I have lots of rows due to the join.  I order the table so that it is based on decending date order from a date field in the second table and then do a group by, I was hoping this would result in the last record for each unique group.
>
> What is actually happening is that the first record that is a match from the joined table is being returned.  I tried putting Max(ActionDate) in the join, but I still could not get it to work and finally figured out this was due to the indexing of the table.
>
> Since then I have just put the result set into a memory table ordered in descending date order, then do an SQL optimize on the memory table to re-index in the new order, and finally run a group by on the memory table.  This allows me get the first or last record within a group by simply changing the date sort to Ascending or Decending before optimizing and then doing the group by.
>
> Probably not the most elegent solution, but it works.  Is there a problem with doing things this way, and also, does anyone have any idea how I would resolve it without having to re-optimize each time.
>
> Cheers
> David.
>
Wed, Oct 3 2012 5:00 PMPermanent Link

David

Hi Jose.

I will post some more info tomorrow. shut down for the night as my head was bursting.

Thanks
David.

"Eduardo [HPro]" wrote:

David

Could you post the SQL and/or an example code ? It will help us to provide
support or comment things better.
And if possible a little sample of data.

Eduardo
Thu, Oct 4 2012 5:14 AMPermanent Link

John Hay

David

I am guessing your problem is that you are using non aggreagate fields from the second table eg

Master Table
Id
CustomerName

Detail Table
Id
ActionDate
ActionPerson
Notes
etc

The query is like

SELECT Id,CustomerName,MAX(ActionDate),ActionPerson,Notes FROM Table1
JOIN Table2 on Table1.Id=Table2.Id
GROUP BY Id

Your query with Max(ActionDate) get the right date bit not the ActionPerson or Notes.

Presuming there is only 1 ActionDate for each customer I think the most reliable way to do it is using a script.

SELECT Id,CustomerName,MAX(ActionDate) as ActionDate INTO Memory\Temp FROM Table1
JOIN Table2 on Table1.Id=Table2.Id
GROUP BY Id;
SELECT Id,CustomerName,ActionDate,ActionPerson,Notes FROM Memory\Temp
JOIN Table2 ON Temp.Id=Table2.Id and Temp.ActionDate=Table2.ActionDate;

John


Thu, Oct 4 2012 10:21 AMPermanent Link

David

Hi John.

I was just coming back to provide some sample sql and data but you have saved me the bother.  Yes it is pretty much like what you have written and I now understand where I was going wrong.

I was doing almost everything in your query, but when I was joining the memory table I was only using the ID columns of the two tables and not including the ActionDAte as well.  I can now see how this can be resolved.

Many thanks guys for the help.

David.

"John Hay" wrote:

David

I am guessing your problem is that you are using non aggreagate fields from the second table eg

Master Table
Id
CustomerName

Detail Table
Id
ActionDate
ActionPerson
Notes
etc

The query is like

SELECT Id,CustomerName,MAX(ActionDate),ActionPerson,Notes FROM Table1
JOIN Table2 on Table1.Id=Table2.Id
GROUP BY Id

Your query with Max(ActionDate) get the right date bit not the ActionPerson or Notes.

Presuming there is only 1 ActionDate for each customer I think the most reliable way to do it is using a script.

SELECT Id,CustomerName,MAX(ActionDate) as ActionDate INTO Memory\Temp FROM Table1
JOIN Table2 on Table1.Id=Table2.Id
GROUP BY Id;
SELECT Id,CustomerName,ActionDate,ActionPerson,Notes FROM Memory\Temp
JOIN Table2 ON Temp.Id=Table2.Id and Temp.ActionDate=Table2.ActionDate;

John
Thu, Oct 4 2012 10:21 AMPermanent Link

David

Hi John.

I was just coming back to provide some sample sql and data but you have saved me the bother.  Yes it is pretty much like what you have written and I now understand where I was going wrong.

I was doing almost everything in your query, but when I was joining the memory table I was only using the ID columns of the two tables and not including the ActionDAte as well.  I can now see how this can be resolved.

Many thanks guys for the help.

David.

"John Hay" wrote:

David

I am guessing your problem is that you are using non aggreagate fields from the second table eg

Master Table
Id
CustomerName

Detail Table
Id
ActionDate
ActionPerson
Notes
etc

The query is like

SELECT Id,CustomerName,MAX(ActionDate),ActionPerson,Notes FROM Table1
JOIN Table2 on Table1.Id=Table2.Id
GROUP BY Id

Your query with Max(ActionDate) get the right date bit not the ActionPerson or Notes.

Presuming there is only 1 ActionDate for each customer I think the most reliable way to do it is using a script.

SELECT Id,CustomerName,MAX(ActionDate) as ActionDate INTO Memory\Temp FROM Table1
JOIN Table2 on Table1.Id=Table2.Id
GROUP BY Id;
SELECT Id,CustomerName,ActionDate,ActionPerson,Notes FROM Memory\Temp
JOIN Table2 ON Temp.Id=Table2.Id and Temp.ActionDate=Table2.ActionDate;

John
Image