Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 13 total
Thread Query performance that seems dependent on the number of rows returned
Wed, May 14 2008 3:03 AMPermanent Link

Karl Ross
Hi Tim

I have an application that uses a mixture of TEDBTable and TEDBQuery components. I'm finding that some of my TEDBQuery components run
really slowly, even when very sure that I have the right indexes in place.

In order to verify what was happening I created a database with the following structure
CREATE TABLE "TCustomer"
(
"ID" INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 0, INCREMENT BY 1),
"Name" VARCHAR(30) COLLATE "ANSI_CI" NOT NULL,
"Address1" VARCHAR(40) COLLATE "ANSI_CI",
"Address2" VARCHAR(40) COLLATE "ANSI_CI",
"City" VARCHAR(30) COLLATE "ANSI_CI",
"State" CHAR(2) COLLATE "ANSI_CI",
"Zip" CHAR(10) COLLATE "ANSI_CI",
"CreatedOn" TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT "ID_PrimaryKey" PRIMARY KEY ("ID")
)

CREATE TABLE "TCustomerSale"
(
"ID" INTEGER,
"SaleId" INTEGER,
"SaleDate" DATE,
"SalesAmount" FLOAT,
CONSTRAINT "PK_Sale" PRIMARY KEY ("ID", "SaleId"),
CONSTRAINT "fk_sale" FOREIGN KEY ("ID") REFERENCES "TCustomer" ("ID")
)

and populated it with several thousand records in each table.

Executing the following query in EDBManager was very slow (in the order or 10 seconds).  Yes, I wanted all rows from the left table, hence the left
join.
select a.id, b.saleid from tcustomer a left join tcustomersale b on (a.id = b.id)

Executing the same query using a test program was better (in the order of 3 seconds), but not good enough.

I populated the tables with even more records, and found that the same query took even longer.

I looked extensively through the newsgroups, and can't find a simple set of things to look out for in my code.  Sensitive results aren't going to
help because of the join.

Can someone point me to the right place in the EDB doco that'll tell me how to optimise these queries?  Bear in mind that I need a general
approach, rather than just a "Fix" for the query presented...

Here is the query plan as generated by EDB Manager.  It seems to indicate that I have all the required indexes, and my statement is set up to
make optimal use of them.  So there MUST be something I can do better with the query in Manager and in my code.

Thanks
Karl
================================================================================
SQL Query (Executed by ElevateDB 1.07 Build 3)

Note: The SQL  shown here is generated by ElevateDB and may not be exactly the
same as the SQL that was originally entered.  However, none of the differences
alter the execution results in any way.
================================================================================

SELECT ALL
"a"."id" AS "id",
"b"."saleid" AS "saleid"
FROM "tcustomer" AS "a" LEFT OUTER JOIN "tcustomersale" AS "b" ON ("a"."id" =
"b"."id")

Source Tables
-------------

tcustomer (a): 11815 rows
tcustomersale (b): 58520 rows

Result Set
----------

The result set was insensitive and read-only
The result set consisted of zero or more rows

Joins
-----

The driver table was the tcustomer (a) table

The tcustomer (a) table was joined to the tcustomersale (b) table with the left
outer join expression:

("a"."id" = "b"."id")

The optimizer attempted to re-order the joins to a more optimal order

The joins were already in the most optimal order

The following join condition was applied to the tcustomersale (b) table:

("b"."id" = "a"."id" [Index scan])

================================================================================
70333 row(s) returned in 11.735 secs
================================================================================

Wed, May 14 2008 12:20 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Karl,

<< I have an application that uses a mixture of TEDBTable and TEDBQuery
components. I'm finding that some of my TEDBQuery components run really
slowly, even when very sure that I have the right indexes in place. >>

You're generating a 70+ thousand row result set.  How fast did you expect it
to be ?  You should look into using the RANGE clause to incrementally
populate the result set, especially if you're only interested in the first
few rows to start with:

http://www.elevatesoft.com/manual?action=mantopic&id=edb1sql&category=11&topic=171

See the section entitled "Incremental Result Set Population".

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, May 14 2008 10:00 PMPermanent Link

Karl Ross
"Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote:

Karl,

<< I have an application that uses a mixture of TEDBTable and TEDBQuery
components. I'm finding that some of my TEDBQuery components run really
slowly, even when very sure that I have the right indexes in place. >>

You're generating a 70+ thousand row result set.  How fast did you expect it
to be ?  You should look into using the RANGE clause to incrementally
populate the result set, especially if you're only interested in the first
few rows to start with:

http://www.elevatesoft.com/manual?action=mantopic&id=edb1sql&category=11&topic=171



See the section entitled "Incremental Result Set Population".

--
Tim Young
Elevate Software
www.elevatesoft.com


Hang on a minute, we're measuring different things here

I'm measuring the time it takes to open the query ie


startTimeStamp...
pQuery->Active = true
endTimeStamp

At that point all the database engine should have done is worked out how to join up the tables, and generated a cursor to (perhaps) the start of the
result set.  It may or may not know how many records will be in the result set, it most certainly should NOT have sent any (many) records from
the server to the client.  

If I was measuring the time taken to do this AND iterate through the result set, it would make sense for the time taken to increase with the
resultset size, IE if I was measuring

startTimeStamp...
pQuery->Active = true
pQuery->First()
while (!pQuery->Eof)
{
 pQuery->Next();
}
endTimeStamp

then it would make sense.

So the questions for Tim then are

1.) Does EDB load the entire result set into memory when opening a query?  
2.) If not, why is query plan generation occupying all of that time?
3.) And finally, if the SQL layer of EDB is really just translating SQL->Native (which I read in either the doco or on one of these news groups, why
is it so much quicker to use two TEDBTable components and a Master/Detail relationship?  The only "real" difference at that point is that a
developer has hand picked the indexes to use (which are the same ones that the query plan generator will use anyway)
Thu, May 15 2008 2:39 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Karl


>I'm measuring the time it takes to open the query ie
>
>
>startTimeStamp...
>pQuery->Active = true
>endTimeStamp
>
>At that point all the database engine should have done is worked out how to join up the tables, and generated a cursor to (perhaps) the start of the
>result set. It may or may not know how many records will be in the result set, it most certainly should NOT have sent any (many) records from
>the server to the client.

Tim can correct me if I'm wrong but you are making invalid assumptions about how ElevateDB works. Open a query = produce the result set NOT just work out how to do it. If the query is linked to a data aware control it will have sent some information to them (enough to populate them). In your case since its an insensitive query then opening the query requires writing 70k+ records to a temporary file on the server.

>If I was measuring the time taken to do this AND iterate through the result set, it would make sense for the time taken to increase with the
>resultset size

Its the time taken to write the temporary file to the servers disk

>1.) Does EDB load the entire result set into memory when opening a query?

Depends what you call memory Smileyand wether the result set is sensitive / insensitive. For sensitive result sets ElevateDB (sort of) does the equivalent of setting a filter on the table, for insensitive result sets it writes a temporary table to disk (memory?)

>2.) If not, why is query plan generation occupying all of that time?

If you look at the bottom of the query plan you'll see it tells you how many rows and how long it took - that says its produced the result set.

>3.) And finally, if the SQL layer of EDB is really just translating SQL->Native (which I read in either the doco or on one of these news groups, why
>is it so much quicker to use two TEDBTable components and a Master/Detail relationship? The only "real" difference at that point is that a
>developer has hand picked the indexes to use (which are the same ones that the query plan generator will use anyway)

No idea about the first bit, but for the speed differences think extracting and writing 70k+ records Vs opening two tables and setting a range on one - neither of which require a disk write.

Roy Lambert [Team Elevate]
Thu, May 15 2008 2:28 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Karl,

<< At that point all the database engine should have done is worked out how
to join up the tables, and generated a cursor to (perhaps) the start of the
result set.  It may or may not know how many records will be in the result
set, it most certainly should NOT have sent any (many) records from the
server to the client. >>

Incorrect.  Opening a query *executes* the query, which in this case means
generating a 70+ thousand row result set, because that's what you're asking
it to do.  Joins in queries in ElevateDB are not "live", meaning that the
entire result set needs to be materialized before the query execution is
complete.  Using the RANGE clause allows you to say "I don't want the whole
result set materialized and I don't care to know the entire row count for
the result set".

<< 1.) Does EDB load the entire result set into memory when opening a query?
>>

No, it writes out the result set as a temporary table.

<< 3.) And finally, if the SQL layer of EDB is really just translating
SQL->Native (which I read in either the doco or on one of these news groups,
why is it so much quicker to use two TEDBTable components and a
Master/Detail relationship? >>

SQL isn't just translating SQL->native.  There's a *lot* more going on,
including re-ordering joins for optimal execution order, getting I/O costs
and rewriting expressions to make the execution more optimal, etc.

--
Tim Young
Elevate Software
www.elevatesoft.com

Thu, May 15 2008 8:02 PMPermanent Link

Karl Ross
Tim

Thanks for the response.

I don't want to get into a debate/discussion on how how the internals of the database engine work.  From a personal point of view I'd like to know,
but right now I'm writing an application which will require me to process lots and lots of records retreived from a variety of tables, and potentially
including some logic/calculations which are much better done on the server side than on the client PC.

I'm using this database engine, I will need to retrieve lots of records (for a variety of reasons, sometimes including display purposes), I don't want
to have to write code to limit the results of the queries (Range clause) all over the shop.  How can I get the best out of this engine?

Client side query was too slow (without using the range clause)

View was not much better (In fact I think it was comparable to a client side query)

The other option is Stored Procedure.  Will it be any better?

SQL is a generally a far better option than TTable (TEDBTable) purely because
1.) It works regardless of whether the indexes are there or not (albeit performing slowly when they're not there)
2.) I'm sure the database engine can do a far better job of chosing the right indexes, deciding which order to evaluate expressions etc (ie coming
up with a query plan) than most C++/Delphi/Java etc developers ever could.  
3.) TTables with calculated/lookup fields etc put all of the responsibility for calculations onto the client machine, which is typically far less grunty
than the server, and has far more "important' things to do like keeping the user happy.

So, the crux of the question is:

How do I squeeze the best possible performance out of this database engine using SQL/Stored Procedures etc?
Fri, May 16 2008 3:59 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Karl


If your application requires you to extract lots of records and create temporary tables with them its going to be slow regardless of how you do it and, from my experience, that includes writing the results to a memory table (subsequent processing will be faster though).

It might be possible to write your calculations as functions which could then be called as part of your sql, or, depending on the nature of the calculation, it might be possible to create computed/generated fields for them.

I doubt any of us will be able to give generalised advice of the nature you want, but if you can post specific questions (with lots of detail obviously) we'll all do the best we can to help you speed things up.

Finally I'd like to say that sometimes tables and using the client machine to do the work is a good alternative. Sometimes its actually or subjectively faster than a query and at least you can show the user somethings going on if its a bit slow so they know wether to get a cup of coffee or cook a 3 course meal Smiley

Roy Lambert [Team Elevate]
Fri, May 16 2008 12:59 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Karl,

<< I'm using this database engine, I will need to retrieve lots of records
(for a variety of reasons, sometimes including display purposes), I don't
want to have to write code to limit the results of the queries (Range
clause) all over the shop.  How can I get the best out of this engine? >>

Unfortunately, that's it.  If you want to limit the rows that are populated
into the result set, then you'll need to use the RANGE clause.  EDB assumes
that you want to see the entire result set unless you say otherwise via the
RANGE clause.  It avoids issues like a RecordCount property reference or a
Last method call causing a 10-20 second delay while the entire result set is
populated.  IOW, the performance is predictable and controllable which, in
my opinion, is better than the "fetch on demand" result set population due
to the aforementioned side effects.

<< The other option is Stored Procedure.  Will it be any better? >>

No.  The issues is the result set population, which will be the same no
matter how the query is executed, unless you use the RANGE clause.

--
Tim Young
Elevate Software
www.elevatesoft.com

Fri, May 16 2008 10:40 PMPermanent Link

Karl Ross
Tim

If I use the Range clause, how do I insure that the results I getting for last X number of records in the result set haven't been affected by some
external factor (eg another user on another PC writting to the database) between when I used the range clause to get the first X records and
when I finally use it to get the last X records.

Eg

Select range 1 to 5
{do something}
Select range 6 to 50
{do something}
Select range 51 to 100
{do something}

The short answer will be that I can't make that guarentee I think.

The other question is, if every Tom Dick and Harry can write their own code to make use of the range clause, why doesn;t the TEDBQuery
component do it in the first place?
Mon, May 19 2008 1:02 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Karl,

<< The short answer will be that I can't make that guarentee I think. >>

Nor can you do so with incremental fetching unless you have some pretty
restrictive locking in place.

<< The other question is, if every Tom Dick and Harry can write their own
code to make use of the range clause, why doesn;t the TEDBQuery component do
it in the first place? >>

I'm pretty sure I already answered this - ElevateDB materializes the entire
result set, as indicated by the query, *unless* the user or developer
indicates that they want to do so otherwise via the RANGE.  In other words,
it's not a matter of can or can't.

--
Tim Young
Elevate Software
www.elevatesoft.com

Page 1 of 2Next Page »
Jump to Page:  1 2
Image