Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 7 of 7 total
Thread How to create records with individual months from date range.
Wed, Jan 14 2009 10:11 PMPermanent Link

"Adam H."
Hi,

I've got a table with records in it as follows:

ID : String (Key field)
StartDate : Date
EndDate : Date
Hrs : Integer;

What I would like to do is create a result set that divides and assigns the
value from Hrs across for each month between the startdate and enddate
fields (with the remainder going to the last month).

To try and clarify, let's say I have a record with the following:

ID: X
StartDate: 1st Jan, 2009
EndDate : 1st Mar, 2009
Hrs: 10

I would like to get a result set that shows:

ID    Date    Hrs
X     Jan 09      3
X     Feb 09     3
X     Mar 09     4


(In reality you could have 100 records in the orginal table with a result
set into the thousands, but this is just as a simple example).

Just wondering what would be the best approach to do this via SQL?

Cheers

Adam.
Thu, Jan 15 2009 12:50 AMPermanent Link

"Adam H."
I've come up with something, but I'm not sure if it's the most streamlined
way to achieve this. Would anyone care to comment (or let me know if you see
a problem with the following)?  Smiley

Firstly, I created a table with a list of dates from the beginning of the
century until the next century for my SQL to work with / link to (as I
couldn't see a way of creating such data on the fly)...

Then, I've run a query like the following (cut down for simplicity and to
match the data example I posted on previous post)...

==============================================
/*First get a result set showing a list of all valid dates linked with the
original records within the date range*/
Select ID, StartDate, EndDate, LM.Date, Hrs OrigHrs
into Memory\AMDateSet
from MyData AM
left outer join ListOfMonths LM on (LM.Date >= (AM.StartDate - extract(day
from AM.Startdate)+1)) and (LM.Date <=AM.EndDate);

/*Now calculate how many records each ID has (to divide with later)*/
select ID, count(Date) DateCount
into Memory\AMDateCount
from Memory\AMDateSet
group by ID;

/*Now calculate the total hours and divide them by the number of valid dates
obtained above*/
Select ID, StartDate, EndDate, Date, OrigHrs, DateCount, trunc(OrigHrs /
DateCount) DivHrs
into Memory\AMDateGroup1
from Memory\AMDateSet M1
left outer join Memory\AMDateCount M2 on (M2.ID = M1.ID)
;

/*Now calculate the "remainder hours" (considering we can't split hours into
decimals*/
select ID, Max(Date) MaxDate, OrigHrs, Sum(DivHrs) Totalhrs
into Memory\AMDateGroup2
from Memory\AMDateGroup1
group by ID, OrigHrs;

/*Now update the last record for each ID group to have the correct number of
hours so everything adds up correctly*/
update Memory\AMDateGroup1 M1
set M1.DivHrs = M1.DivHrs + (M2.OrigHrs - M2.TotalHrs)
From Memory\AMDateGroup1 M1
inner join Memory\AMDateGroup2 M2 on (M2.ID = M1.ID) and (M2.MaxDate =
M1.Date);

/*Now show the result */
select ID, Date, StartDate, EndDate, OrigHrs, DivHrs
From Memory\AMDateGroup1

==============================================

It's a little complex, but get's the job done. The only other way I could
think of this is with a loop in Delphi, but would imagine it would be much
slower to generate a table with the same results (considering we could be
talking tens or hundreds of thousands of records in the result set).

Cheers

Adam.
Thu, Jan 15 2009 9:57 AMPermanent Link

"Robert"

"Adam H." <ahairsub5@jvxp_removeme.com> wrote in message
news:F727EC0F-C80F-4330-A264-6B8379C1F16C@news.elevatesoft.com...
> I've come up with something, but I'm not sure if it's the most streamlined
> way to achieve this. Would anyone care to comment (or let me know if you
> see a problem with the following)?  Smiley

Hard to tell if there is a problem. I suspect, though, that it will be
veryyyyy sloooooow.

I would try to turn the dates into integers as soon as possible, and get rid
of the days. Since you don't need the day, (Year * 12) + Month would give
you a unique "bucket number". So your list of dates table would be something
like

Date        Bucket
2008-01   24097
2008-02   24098
etc

and index it on bucket. Then you can do all your links on that number,
instead of using the date. Otherwise, something like this

> left outer join ListOfMonths LM on (LM.Date >= (AM.StartDate - extract(day
> from AM.Startdate)+1)) and (LM.Date <=AM.EndDate);

could be pretty disastrous with many thousands of records. In general, as I
said, I would try to get rid of the day ASAP and compare integers
representing a "month number"

between (extract(startdate(year) * 12) + extract(startmonth) and same for
end date.

Bunch of other advantages, I think, to calculating those "bucket numbers" as
soon as possible and storing them in the extracted data, such as the fact
that end bucket - start bucket + 1 is the number of months, etc.

Another couple of comments

1. Remember that you can use the mod function to get the remainder to add to
the last bucket
2. Make sure you index the temporary tables. Adding an index in DBISAM is
super fast, and the speed benefits on the joins are huge.

Robert


Thu, Jan 15 2009 5:08 PMPermanent Link

"Adam H."
Hi Robert,

Thank you very much for your reply and suggestions. I shall give your
suggestions a try and see who we go!

Best Regards

Adam.
Thu, Jan 15 2009 5:43 PMPermanent Link

"Adam H."
Good Morning Robert,

Forgive me - I've had 2 nights bad sleep, so aren't completely awake at the
moment. Smiley

I've modified the start of my SQL to the following below, but was wondering
if it would make that much of a performance issue. My reasoning is that you
still have the line:

> left outer join Memory\M8_Months LM on (LM.Monthbucket >= AM.StartBucket)
> and (LM.MonthBucket <=AM.EndBucket)

but instead of working on a TDate field, it's working on a TInteger field.
(Which I was under the opinion a TDatefield is very similar to an integer
field anyway).

I can see that it's a little 'cleaner' since we're comparing months as an
integer instead of a date, but the comparison I need to do is still between
a start and end range.

(Although I can see your point on creating indexes on the memory tables!)

The Modified code:

===============================================================
/*First assign 'Bucket' Dates to the dates we're working with...
 1) For the ApprenticeModules table, and
 2) For the list of valid dates from the Month List table */

select AM.AppCourseID, AM.ModuleID, AM.StartDate, AM.EndDate, (extract(Year
from AM.StartDate) * 12) + (extract(Month from AM.StartDate)) as
StartBucket, (extract(Year from AM.EndDate) * 12) + (extract(Month from
AM.EndDate)) as EndBucket, AM.Hrs
into Memory\M8_ApprenticeModule
From ApprenticeModules AM
;
Select LM.Date, (extract(Year from Date) * 12) + (extract(Month from Date))
as MonthBucket
into Memory\M8_Months
From ListofMonths LM;

CREATE INDEX IF NOT EXISTS  "StartBucket" on Memory\M8_ApprenticeModule
("StartBucket");
CREATE INDEX IF NOT EXISTS  "EndBucket" on Memory\M8_ApprenticeModule
("EndBucket");
CREATE INDEX IF NOT EXISTS  "MonthBucket" on Memory\M8_Months
("MonthBucket");

/*Grab list of all valid available months per record*/
select AM.AppCourseID, AM.ModuleID, AM.StartDate, AM.EndDate,
AM.StartBucket, AM.EndBucket, LM.Date, LM.MonthBucket, AM.Hrs OrigHrs
into Memory\M8_AM1
from Memory\M8_ApprenticeModule AM
left outer join Memory\M8_Months LM on (LM.Monthbucket >= AM.StartBucket)
and (LM.MonthBucket <=AM.EndBucket)
===========================================================================;

Thanks & Regards

Adam.
Thu, Jan 15 2009 6:07 PMPermanent Link

"Robert"

"Adam H." <ahairsub5@jvxp_removeme.com> wrote in message
news:31CE52A4-B02E-4C88-AF4B-F253DB0A5584@news.elevatesoft.com...
> Good Morning Robert,
>
> Forgive me - I've had 2 nights bad sleep, so aren't completely awake at
> the moment. Smiley
>
> I've modified the start of my SQL to the following below, but was
> wondering if it would make that much of a performance issue. My reasoning
> is that you still have the line:
>
>> left outer join Memory\M8_Months LM on (LM.Monthbucket >= AM.StartBucket)
>> and (LM.MonthBucket <=AM.EndBucket)
>
> but instead of working on a TDate field, it's working on a TInteger field.
> (Which I was under the opinion a TDatefield is very similar to an integer
> field anyway).
>

True enough, but the integer fields are now indexed, whereas before one of
the date fields was constructed on the fly so it would require a brute force
search (I believe, not even DBISAM is THAT smart). Dates are floats, BTW.

A minor suggestion, pull the month and year when you build the M8_Months
table, so you'll have them indexed by bucket and ready to go when you need
to build your final report / display, where obviously you will not want a
bucket number but a real month and year.

Why are you using a left outer join? Could not figure that one out.

Robert

Thu, Jan 15 2009 9:02 PMPermanent Link

"Adam H."
Hi Robert,

Thanks again for your help!

> True enough, but the integer fields are now indexed, whereas before one of
> the date fields was constructed on the fly so it would require a brute
> force search (I believe, not even DBISAM is THAT smart). Dates are floats,
> BTW.

Aaah - I was thinking since timestamps were float (like) then dates might
have been integers. Was just a bad guess. Thanks for the correction. Smile

> Why are you using a left outer join? Could not figure that one out.

Because I've had 2 bad nights sleep.  Wink

Thanks for pointing that out too!

Have a great weekend!

Adam.
Image