Support Forums - View Thread

View Thread

The following is the text of the current message along with any replies.

Messages 1 to 10 of 13 total

Query performance that seems dependent on the number of rows returned

Wed, May 14 2008 3:03 AM	Permanent Link
Karl Ross	Hi Tim I have an application that uses a mixture of TEDBTable and TEDBQuery components. I'm finding that some of my TEDBQuery components run really slowly, even when very sure that I have the right indexes in place. In order to verify what was happening I created a database with the following structure CREATE TABLE "TCustomer" ( "ID" INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 0, INCREMENT BY 1), "Name" VARCHAR(30) COLLATE "ANSI_CI" NOT NULL, "Address1" VARCHAR(40) COLLATE "ANSI_CI", "Address2" VARCHAR(40) COLLATE "ANSI_CI", "City" VARCHAR(30) COLLATE "ANSI_CI", "State" CHAR(2) COLLATE "ANSI_CI", "Zip" CHAR(10) COLLATE "ANSI_CI", "CreatedOn" TIMESTAMP DEFAULT CURRENT_TIMESTAMP, CONSTRAINT "ID_PrimaryKey" PRIMARY KEY ("ID") ) CREATE TABLE "TCustomerSale" ( "ID" INTEGER, "SaleId" INTEGER, "SaleDate" DATE, "SalesAmount" FLOAT, CONSTRAINT "PK_Sale" PRIMARY KEY ("ID", "SaleId"), CONSTRAINT "fk_sale" FOREIGN KEY ("ID") REFERENCES "TCustomer" ("ID") ) and populated it with several thousand records in each table. Executing the following query in EDBManager was very slow (in the order or 10 seconds). Yes, I wanted all rows from the left table, hence the left join. select a.id, b.saleid from tcustomer a left join tcustomersale b on (a.id = b.id) Executing the same query using a test program was better (in the order of 3 seconds), but not good enough. I populated the tables with even more records, and found that the same query took even longer. I looked extensively through the newsgroups, and can't find a simple set of things to look out for in my code. Sensitive results aren't going to help because of the join. Can someone point me to the right place in the EDB doco that'll tell me how to optimise these queries? Bear in mind that I need a general approach, rather than just a "Fix" for the query presented... Here is the query plan as generated by EDB Manager. It seems to indicate that I have all the required indexes, and my statement is set up to make optimal use of them. So there MUST be something I can do better with the query in Manager and in my code. Thanks Karl ================================================================================ SQL Query (Executed by ElevateDB 1.07 Build 3) Note: The SQL shown here is generated by ElevateDB and may not be exactly the same as the SQL that was originally entered. However, none of the differences alter the execution results in any way. ================================================================================ SELECT ALL "a"."id" AS "id", "b"."saleid" AS "saleid" FROM "tcustomer" AS "a" LEFT OUTER JOIN "tcustomersale" AS "b" ON ("a"."id" = "b"."id") Source Tables ------------- tcustomer (a): 11815 rows tcustomersale (b): 58520 rows Result Set ---------- The result set was insensitive and read-only The result set consisted of zero or more rows Joins ----- The driver table was the tcustomer (a) table The tcustomer (a) table was joined to the tcustomersale (b) table with the left outer join expression: ("a"."id" = "b"."id") The optimizer attempted to re-order the joins to a more optimal order The joins were already in the most optimal order The following join condition was applied to the tcustomersale (b) table: ("b"."id" = "a"."id" [Index scan]) ================================================================================ 70333 row(s) returned in 11.735 secs ================================================================================
Wed, May 14 2008 12:20 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Karl, << I have an application that uses a mixture of TEDBTable and TEDBQuery components. I'm finding that some of my TEDBQuery components run really slowly, even when very sure that I have the right indexes in place. >> You're generating a 70+ thousand row result set. How fast did you expect it to be ? You should look into using the RANGE clause to incrementally populate the result set, especially if you're only interested in the first few rows to start with: http://www.elevatesoft.com/manual?action=mantopic&id=edb1sql&category=11&topic=171 See the section entitled "Incremental Result Set Population". -- Tim Young Elevate Software www.elevatesoft.com
Wed, May 14 2008 10:00 PM	Permanent Link
Karl Ross	"Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote: Karl, << I have an application that uses a mixture of TEDBTable and TEDBQuery components. I'm finding that some of my TEDBQuery components run really slowly, even when very sure that I have the right indexes in place. >> You're generating a 70+ thousand row result set. How fast did you expect it to be ? You should look into using the RANGE clause to incrementally populate the result set, especially if you're only interested in the first few rows to start with: http://www.elevatesoft.com/manual?action=mantopic&id=edb1sql&category=11&topic=171 See the section entitled "Incremental Result Set Population". -- Tim Young Elevate Software www.elevatesoft.com Hang on a minute, we're measuring different things here I'm measuring the time it takes to open the query ie startTimeStamp... pQuery->Active = true endTimeStamp At that point all the database engine should have done is worked out how to join up the tables, and generated a cursor to (perhaps) the start of the result set. It may or may not know how many records will be in the result set, it most certainly should NOT have sent any (many) records from the server to the client. If I was measuring the time taken to do this AND iterate through the result set, it would make sense for the time taken to increase with the resultset size, IE if I was measuring startTimeStamp... pQuery->Active = true pQuery->First() while (!pQuery->Eof) { pQuery->Next(); } endTimeStamp then it would make sense. So the questions for Tim then are 1.) Does EDB load the entire result set into memory when opening a query? 2.) If not, why is query plan generation occupying all of that time? 3.) And finally, if the SQL layer of EDB is really just translating SQL->Native (which I read in either the doco or on one of these news groups, why is it so much quicker to use two TEDBTable components and a Master/Detail relationship? The only "real" difference at that point is that a developer has hand picked the indexes to use (which are the same ones that the query plan generator will use anyway)
Thu, May 15 2008 2:39 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Karl >I'm measuring the time it takes to open the query ie > > >startTimeStamp... >pQuery->Active = true >endTimeStamp > >At that point all the database engine should have done is worked out how to join up the tables, and generated a cursor to (perhaps) the start of the >result set. It may or may not know how many records will be in the result set, it most certainly should NOT have sent any (many) records from >the server to the client. Tim can correct me if I'm wrong but you are making invalid assumptions about how ElevateDB works. Open a query = produce the result set NOT just work out how to do it. If the query is linked to a data aware control it will have sent some information to them (enough to populate them). In your case since its an insensitive query then opening the query requires writing 70k+ records to a temporary file on the server. >If I was measuring the time taken to do this AND iterate through the result set, it would make sense for the time taken to increase with the >resultset size Its the time taken to write the temporary file to the servers disk >1.) Does EDB load the entire result set into memory when opening a query? Depends what you call memory and wether the result set is sensitive / insensitive. For sensitive result sets ElevateDB (sort of) does the equivalent of setting a filter on the table, for insensitive result sets it writes a temporary table to disk (memory?) >2.) If not, why is query plan generation occupying all of that time? If you look at the bottom of the query plan you'll see it tells you how many rows and how long it took - that says its produced the result set. >3.) And finally, if the SQL layer of EDB is really just translating SQL->Native (which I read in either the doco or on one of these news groups, why >is it so much quicker to use two TEDBTable components and a Master/Detail relationship? The only "real" difference at that point is that a >developer has hand picked the indexes to use (which are the same ones that the query plan generator will use anyway) No idea about the first bit, but for the speed differences think extracting and writing 70k+ records Vs opening two tables and setting a range on one - neither of which require a disk write. Roy Lambert [Team Elevate]
Thu, May 15 2008 2:28 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Karl, << At that point all the database engine should have done is worked out how to join up the tables, and generated a cursor to (perhaps) the start of the result set. It may or may not know how many records will be in the result set, it most certainly should NOT have sent any (many) records from the server to the client. >> Incorrect. Opening a query executes the query, which in this case means generating a 70+ thousand row result set, because that's what you're asking it to do. Joins in queries in ElevateDB are not "live", meaning that the entire result set needs to be materialized before the query execution is complete. Using the RANGE clause allows you to say "I don't want the whole result set materialized and I don't care to know the entire row count for the result set". << 1.) Does EDB load the entire result set into memory when opening a query? >> No, it writes out the result set as a temporary table. << 3.) And finally, if the SQL layer of EDB is really just translating SQL->Native (which I read in either the doco or on one of these news groups, why is it so much quicker to use two TEDBTable components and a Master/Detail relationship? >> SQL isn't just translating SQL->native. There's a lot more going on, including re-ordering joins for optimal execution order, getting I/O costs and rewriting expressions to make the execution more optimal, etc. -- Tim Young Elevate Software www.elevatesoft.com
Thu, May 15 2008 8:02 PM	Permanent Link
Karl Ross	Tim Thanks for the response. I don't want to get into a debate/discussion on how how the internals of the database engine work. From a personal point of view I'd like to know, but right now I'm writing an application which will require me to process lots and lots of records retreived from a variety of tables, and potentially including some logic/calculations which are much better done on the server side than on the client PC. I'm using this database engine, I will need to retrieve lots of records (for a variety of reasons, sometimes including display purposes), I don't want to have to write code to limit the results of the queries (Range clause) all over the shop. How can I get the best out of this engine? Client side query was too slow (without using the range clause) View was not much better (In fact I think it was comparable to a client side query) The other option is Stored Procedure. Will it be any better? SQL is a generally a far better option than TTable (TEDBTable) purely because 1.) It works regardless of whether the indexes are there or not (albeit performing slowly when they're not there) 2.) I'm sure the database engine can do a far better job of chosing the right indexes, deciding which order to evaluate expressions etc (ie coming up with a query plan) than most C++/Delphi/Java etc developers ever could. 3.) TTables with calculated/lookup fields etc put all of the responsibility for calculations onto the client machine, which is typically far less grunty than the server, and has far more "important' things to do like keeping the user happy. So, the crux of the question is: How do I squeeze the best possible performance out of this database engine using SQL/Stored Procedures etc?
Fri, May 16 2008 3:59 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Karl If your application requires you to extract lots of records and create temporary tables with them its going to be slow regardless of how you do it and, from my experience, that includes writing the results to a memory table (subsequent processing will be faster though). It might be possible to write your calculations as functions which could then be called as part of your sql, or, depending on the nature of the calculation, it might be possible to create computed/generated fields for them. I doubt any of us will be able to give generalised advice of the nature you want, but if you can post specific questions (with lots of detail obviously) we'll all do the best we can to help you speed things up. Finally I'd like to say that sometimes tables and using the client machine to do the work is a good alternative. Sometimes its actually or subjectively faster than a query and at least you can show the user somethings going on if its a bit slow so they know wether to get a cup of coffee or cook a 3 course meal Roy Lambert [Team Elevate]
Fri, May 16 2008 12:59 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Karl, << I'm using this database engine, I will need to retrieve lots of records (for a variety of reasons, sometimes including display purposes), I don't want to have to write code to limit the results of the queries (Range clause) all over the shop. How can I get the best out of this engine? >> Unfortunately, that's it. If you want to limit the rows that are populated into the result set, then you'll need to use the RANGE clause. EDB assumes that you want to see the entire result set unless you say otherwise via the RANGE clause. It avoids issues like a RecordCount property reference or a Last method call causing a 10-20 second delay while the entire result set is populated. IOW, the performance is predictable and controllable which, in my opinion, is better than the "fetch on demand" result set population due to the aforementioned side effects. << The other option is Stored Procedure. Will it be any better? >> No. The issues is the result set population, which will be the same no matter how the query is executed, unless you use the RANGE clause. -- Tim Young Elevate Software www.elevatesoft.com
Fri, May 16 2008 10:40 PM	Permanent Link
Karl Ross	Tim If I use the Range clause, how do I insure that the results I getting for last X number of records in the result set haven't been affected by some external factor (eg another user on another PC writting to the database) between when I used the range clause to get the first X records and when I finally use it to get the last X records. Eg Select range 1 to 5 {do something} Select range 6 to 50 {do something} Select range 51 to 100 {do something} The short answer will be that I can't make that guarentee I think. The other question is, if every Tom Dick and Harry can write their own code to make use of the range clause, why doesn;t the TEDBQuery component do it in the first place?
Mon, May 19 2008 1:02 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Karl, << The short answer will be that I can't make that guarentee I think. >> Nor can you do so with incremental fetching unless you have some pretty restrictive locking in place. << The other question is, if every Tom Dick and Harry can write their own code to make use of the range clause, why doesn;t the TEDBQuery component do it in the first place? >> I'm pretty sure I already answered this - ElevateDB materializes the entire result set, as indicated by the query, unless the user or developer indicates that they want to do so otherwise via the RANGE. In other words, it's not a matter of can or can't. -- Tim Young Elevate Software www.elevatesoft.com

	Page 1 of 2	Next Page »
Jump to Page: 1 2