Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 9 of 9 total
Thread Query Speed issues on casting a join
Mon, Jan 21 2013 12:48 AMPermanent Link

Adam H.

Hi,

I'm just looking for some suggestions as to the best way to handle this
challenge I face. I have a query in DBISam that looks like the following:

Select M.DeltaStudentID, Max(AC.Item) Item
into SVTImportMaxItemM
From SVTImport M
inner join ApprenticeCourse AC on
    (AC.DeltaStudentNo = M.DeltaStudentID)
group by DeltaStudentID;

The initial problem I faced is that DeltaStudentNo / ID are string
fields, but in one of the tables may contain leading zero's, and in
another may not. As this is from an import routine, I don't really have
the option myself to force people to use integer's for this, so I match
them up by casting them as large integer values as follows:



Select M.DeltaStudentID, Max(AC.Item) Item
into SVTImportMaxItemM
From SVTImport M
inner join ApprenticeCourse AC on
    (cast(AC.DeltaStudentNo as LargeInt) = cast(M.DeltaStudentID as
LargeInt))
group by DeltaStudentID;

This works fine in DBSys, running quite fast as I would expect. (Just
over 1 second). However, when I run it through my application in Delphi
(the same SQL) it takes a significant amount of time. (Seems like more
than 10 minutes).

Now one thing I notice is that if I run it through the IDE I get a
significant amount of error messages (such as exception class
EConvertError with message 'sdash  zbn csvvvvvvv' is not a valid integer
value).

I'm suspicious that maybe their are different compiling options turned
on (debug?) that might be taking longer to execute the try/except block
at line 4572 in dbisamlb.pas - but I'm not sure, nor am I sure if
there's a better way to get around this problem?

Does anyone have any thoughts?

Best Regards

Adam.
Mon, Jan 21 2013 5:16 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam


1) DBSys is "just" a Delphi application to which you have the source code. You can have a look inside and see how its compiled what switches are used etc.

2) You don't mention which version of Delphi you're using - I think DBSys is compiled in D7 so later versions of Delphi could have an impact, but I seriously doubt it would be of the magnitude you're getting.

3) Are there any visual controls connected - that's always a good way to slow things down.

4) Even though they are string fields you're using can you not format them on import or pre or post processes them to ensure compatibility

5) If you can't reformat the fields can you add an extra field which is an integer

That's all the general stuff

This

<<Now one thing I notice is that if I run it through the IDE I get a
significant amount of error messages (such as exception class
EConvertError with message 'sdash  zbn csvvvvvvv' is not a valid integer
value).>>

I'm not sure exactly what it means. My first guess would be that you have some data that is not formed from digits. My second was that even with a largeint you have some numbers that are to big to convert. My third is that you have a corrupt table / index.

A quick and dirty test would be

SELECT cast(DeltaStudentNo as LargeInt),DeltaStudentNo FROM ApprenticeCourse WHERE cast(DeltaStudentNo as LargeInt) IS NULL

and then the same for the other table

I'd also be interested in seeing the execution plan both for running in DBSys and in your app

Roy Lambert [Team Elevate]
Mon, Jan 21 2013 10:37 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Adam,

<< This works fine in DBSys, running quite fast as I would expect. (Just
over 1 second). However, when I run it through my application in Delphi (the
same SQL) it takes a significant amount of time. (Seems like more than 10
minutes). >>

I would expect DBSYS to take a long time also because the join is not
optimizable with the cast in there.  Are you sure that you're using the same
query ?

Thanks,

Tim Young
Elevate Software
www.elevatesoft.com
Mon, Jan 21 2013 11:07 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam


Tim got here before I could add that but I will add - are you looking at the same data?

Roy Lambert [Team Elevate]

Mon, Jan 21 2013 4:22 PMPermanent Link

Adam H.

Hi Roy,

Thanks for your reply...

I'm using Delphi 2007 at present on Windows 64. Currently running DBISam
4.34b5 but will rebuild with the latest version to see if it makes a
difference.

I have got a progress bar that goes across, but even when I disable that
(ie, just run the query normally) it is still very slow.

I will try and look at creating additional tables 'in memory' casting
them before doing any joins on tables to see if this helps, although the
thing that's got me beat is why DBSys works so fast, and my application
is running so slow.

I ran the process last night, and will reply to Tim's post with a copy
of the query plans. Last night's took more than 10 minutes (took 8hrs to
complete!) I haven't known it to take that long before...

Cheers

Adam.

> Adam
>
>
> 1) DBSys is "just" a Delphi application to which you have the source code. You can have a look inside and see how its compiled what switches are used etc.
>
> 2) You don't mention which version of Delphi you're using - I think DBSys is compiled in D7 so later versions of Delphi could have an impact, but I seriously doubt it would be of the magnitude you're getting.
>
> 3) Are there any visual controls connected - that's always a good way to slow things down.
>
> 4) Even though they are string fields you're using can you not format them on import or pre or post processes them to ensure compatibility
>
> 5) If you can't reformat the fields can you add an extra field which is an integer
>
> That's all the general stuff
>
> This
>
> <<Now one thing I notice is that if I run it through the IDE I get a
> significant amount of error messages (such as exception class
> EConvertError with message 'sdash  zbn csvvvvvvv' is not a valid integer
> value).>>
>
> I'm not sure exactly what it means. My first guess would be that you have some data that is not formed from digits. My second was that even with a largeint you have some numbers that are to big to convert. My third is that you have a corrupt table / index.
>
> A quick and dirty test would be
>
> SELECT cast(DeltaStudentNo as LargeInt),DeltaStudentNo FROM ApprenticeCourse WHERE cast(DeltaStudentNo as LargeInt) IS NULL
>
> and then the same for the other table
>
> I'd also be interested in seeing the execution plan both for running in DBSys and in your app
>
> Roy Lambert [Team Elevate]
>
Mon, Jan 21 2013 4:27 PMPermanent Link

Adam H.

Hi Tim,

Thanks for your reply. Indeed I would have thought that DBSys would have
taken a little longer, but it runs very efficiently and quickly.

Last night I ran the two queries, one in DBSys, and the other in my
application, and generated plans for both of them, which I've posted below.

I can confirm that we're running on the same data. (Query plans show
same record counts for each table).

Last night, my application took 8hrs to run the same query that DBSys
ran in just over a second. (And my application ran after DBSys, so if
there was any disk caching involved, I would have thought it should
favour my application).

In all fairness, it has never taken this long before in my application,
so I don't know what I've done to make it so different to DBSys.

The query component doesn't have a dataset attached, and infact within
my application is called using execsql as opposed to open.

Best Regards

Adam.





The plan through DBSys looks like the following:

------------------------------------------------
================================================================================
SQL statement (Executed with 4.34 Build 5)
================================================================================

Select M.DeltaStudentID, Max(AC.Item) Item
into SVTImportMaxItemM
From SVTImport M
inner join ApprenticeCourse AC on

    (cast(AC.DeltaStudentNo as LargeInt) = cast(M.DeltaStudentID as
LargeInt))

group by DeltaStudentID

Tables Involved
---------------

SVTImport (M) table opened shared, has 11 rows
ApprenticeCourse (AC) table opened shared, has 21228 rows

Result Set Generation
---------------------

Result set will be canned

Result set will consist of one or more grouped rows

Result set will be grouped by the following column(s) using a temporary
index:

DeltaStudentID

Result set will be ordered by the temporary index created for the grouping

Join Ordering
-------------

The driver table is the SVTImport table (M)


The SVTImport table (M) is joined to the ApprenticeCourse table (AC)
with the
INNER JOIN expression:


cast(M.DeltaStudentID,LARGEINT) = cast(AC.DeltaStudentNo,LARGEINT)

Optimizer will attempt to re-order the joins to a more optimal order
Use the NOJOINOPTIMIZE clause at the end of the SQL statement to force the
optimizer to leave the joins in their declared order

The joins are already in optimal order and cannot be optimized any further

Join Execution
--------------

Costs ARE NOT being taken into account when executing this join
Use the JOINOPTIMIZECOSTS clause at the end of the SQL statement to
force the
optimizer to consider costs when optimizing this join

The expression:

cast(M.DeltaStudentID,LARGEINT) = cast(AC.DeltaStudentNo,LARGEINT)

is UN-OPTIMIZED

================================================================================
>>>>> 3 rows affected in 1.219 seconds
================================================================================











The plan through my application looks like the following:


Select M.DeltaStudentID, Max(AC.Item) Item
into SVTImportMaxItemM
From SVTImport M
inner join ApprenticeCourse AC on
    (cast(AC.DeltaStudentNo as LargeInt) = cast(M.DeltaStudentID as
LargeInt))
group by DeltaStudentID;



-------------------------------------------
================================================================================
SQL statement (Executed with 4.34 Build 5)
================================================================================

Select M.DeltaStudentID, Max(AC.Item) Item
into SVTImportMaxItemM
From SVTImport M
inner join ApprenticeCourse AC on

    (cast(AC.DeltaStudentNo as LargeInt) = cast(M.DeltaStudentID as
LargeInt))

group by DeltaStudentID

Tables Involved
---------------

SVTImport (M) table opened shared, has 11 rows
ApprenticeCourse (AC) table opened shared, has 21228 rows

Result Set Generation
---------------------

Result set will be canned

Result set will consist of one or more grouped rows

Result set will be grouped by the following column(s) using a temporary
index:

DeltaStudentID

Result set will be ordered by the temporary index created for the grouping

Join Ordering
-------------

The driver table is the SVTImport table (M)


The SVTImport table (M) is joined to the ApprenticeCourse table (AC)
with the
INNER JOIN expression:


cast(M.DeltaStudentID,LARGEINT) = cast(AC.DeltaStudentNo,LARGEINT)

Optimizer will attempt to re-order the joins to a more optimal order
Use the NOJOINOPTIMIZE clause at the end of the SQL statement to force the
optimizer to leave the joins in their declared order

The joins are already in optimal order and cannot be optimized any further

Join Execution
--------------

Costs ARE NOT being taken into account when executing this join
Use the JOINOPTIMIZECOSTS clause at the end of the SQL statement to
force the
optimizer to consider costs when optimizing this join

The expression:

cast(M.DeltaStudentID,LARGEINT) = cast(AC.DeltaStudentNo,LARGEINT)

is UN-OPTIMIZED

================================================================================
>>>>> 3 rows affected in 29161.594 seconds
================================================================================



Mon, Jan 21 2013 5:51 PMPermanent Link

Adam H.

OK, I think I'm onto something here...

I have created a brand new application, with just the Engine, Session,
Database and Query components to run the SQL. If I open the Query at
Designtime, they open up very quickly (same as DBSys). However, if I run
the application and open them up at runtime, it is taking a significant
amount of time to open. (I haven't allowed it to run long enough yet to
get an exact plan or length of time).

I've then fiddled around with the compiler options. I've found that if I
turn off DEBUGGING INFORMATION - I can get the query to run within a
couple of seconds!!!!

I haven't had this issue before, but I'm guessing it has to do with the
number of errors that are raised in the try/except section.

Interesting, because when I look at the help for Debug Information it
says that it increases the size of unit file and takes up additional
memory when compiling programs that use the unit, but it does not affect
the size or speed of the executable program.

I'll keep digging...
Tue, Jan 22 2013 5:28 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam

>I've then fiddled around with the compiler options. I've found that if I
>turn off DEBUGGING INFORMATION - I can get the query to run within a
>couple of seconds!!!!
>
>I haven't had this issue before, but I'm guessing it has to do with the
>number of errors that are raised in the try/except section.
>
>Interesting, because when I look at the help for Debug Information it
>says that it increases the size of unit file and takes up additional
>memory when compiling programs that use the unit, but it does not affect
>the size or speed of the executable program.

This makes me wonder. Are you using FastMM4 in debug mode to trace memory leaks? I had a problem with this in ElevateDB where queries were running faster in EDBManager than in my app. What was happening was that FastMM was keeping hold of things so that it could report on memory leaks and thus slowed everything down. I don't remember it being as extreme as your case though.

My "fix" was to prevent FastMM from carrying out its checks when not in the IDE. You add the following code into the dpr just after the begin.

if DebugHook <> 0 then begin
 ReportMemoryLeaksOnShutdown := True;
 SetMMLogFileName(PChar('C:\zap\Leaks.txt'));
end;


Roy Lambert [Team Elevate]
Tue, Jan 22 2013 9:11 PMPermanent Link

Adam H.

Hi Roy,

Thanks. I do have FastMM referenced in the uses clause, but am not using
the memory leak reporting feature of it.

However I am using Eurekalog - and maybe that has something to do with
it. I was really blown away by the difference in speed between DBSys and
my app - so I'll continue to look around in that particular area.

In the mean time, I have added {$D-} and {$I-} to some of the DBISam
units to force debugging to be turned off for those.

Cheers

Adam.
Image