Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 14 total
Thread Optimize query/understanding impact where claus has when building result set.
Thu, Oct 19 2017 5:50 AMPermanent Link

David

Hi.   I am looking to optimize a query that I have, on my desktop machine it runs fine and also on the C/S client in the main, but occasionally it seems to run really quite slowly.  I am trying to understand what is going on and why the query runs slowly occasionally.

I have two tables involved that are joined together and the where clause filters the master table first resulting in the 8 rows, then I also filter on the joined table with 3 separate clauses.  In the query plan it says that the first clause on the joined table contains 71 rows, the second condition has 27694 and the third has 35878 that are done before any joins are made.  does this mean because of my Where clause the DBISAM server has to do 3 filters on the table then joins them together to get the result set?

Is there something I am doing wrong here or a better way to do this that does not require to much I/O from the database?  I am aware I could try and do this by a memory table, but I was trying to avoid that.

Any help/ advise welcome.  Please see query plan below.

SQL statement (Executed with 4.25 Build 7)
================================================================================

Select  ID,CoyID,t.ContractID,TestItemNo,VialNo,LabelInfo,PreBookedIn,
UnitOfMeasure,Disposed,Volume,TIGMO,TIRiskAssessmentSupplied,CurrentStatus
from TIContract t
join TestItemReceipt a  on (a.ID=t.TestItemID)  
where t.ContractID = 2695 and a.CoyID =1169 and a.Disposed = false and
RecievedDateTime <> null


group by TestItemNo,VialNo Order By TestItemNo,VialNo Asc

Tables Involved
---------------

TIContract (t) table opened shared, has 48095 rows
TestItemReceipt (a) table opened shared, has 35971 rows

Result Set Generation
---------------------

Result set will be canned

Result set will consist of one or more grouped rows

Result set will be grouped by the following column(s) using a temporary index:

TestItemNo
VialNo

Result set will be ordered by the following column(s) using a case-sensitive
temporary index:

TestItemNo ASC
VialNo ASC

WHERE Clause Execution
----------------------

The expression:

t.ContractID = 2695

is OPTIMIZED, covers 8 rows or index keys, costs 100 bytes, and will be applied
to the TIContract table (t) before any joins

The expression:

a.CoyID = 1169

is OPTIMIZED, covers 71 rows or index keys, costs 894 bytes, and will be
applied to the TestItemReceipt table (a) before any joins

The expression:

a.Disposed = false

is OPTIMIZED, covers 27694 rows or index keys, costs 271401 bytes, and will be
applied to the TestItemReceipt table (a) before any joins

The expression:

RecievedDateTime <> null

is OPTIMIZED, covers 35878 rows or index keys, costs 652979 bytes, and will be
applied to the TestItemReceipt table (a) before any joins

Join Ordering
-------------

The driver table is the TIContract table (t)


The TIContract table (t) is joined to the TestItemReceipt table (a) with the
INNER JOIN expression:


t.TestItemID = a.ID

Optimizer will attempt to re-order the joins to a more optimal order
Use the NOJOINOPTIMIZE clause at the end of the SQL statement to force the
optimizer to leave the joins in their declared order

The joins are already in optimal order and cannot be optimized any further

Join Execution
--------------

Costs ARE NOT being taken into account when executing this join
Use the JOINOPTIMIZECOSTS clause at the end of the SQL statement to force the
optimizer to consider costs when optimizing this join

The expression:

t.TestItemID = a.ID

is OPTIMIZED

================================================================================
>>>>> 8 rows affected in 0.031 seconds
================================================================================
Thu, Oct 19 2017 6:47 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

David


Looking at the execution plan, and considering your comments I don't think working on this query is going to deliver the results you want. Something else is going to be causing the slowdown.

Its a canned result so it could be that occasionally, just cos it feels like it, your AV is checking the written file and slowing things down.

If it runs slowly on both your machine and the c/s server then look for common things - eg both running the same AV, if not look for the differences

Other possibilities that cross my mind are:

1. some other query / action accessing the same tables and causing a lock and this query having to wait until the lock is released
2. some other process on the machine putting a heavy cpu load on the PC
3. if it just happens on the c/s machine it could be a perceived slowdown caused by network traffic (ie the query is as fast as ever but the results can't make it over the wire)

Roy Lambert
Thu, Oct 19 2017 9:00 AMPermanent Link

David

Hi Roy.

Thanks for your reply.

#1 is an option as other clients will be accessing the tables, but the slowness can last a couple of hours or more then suddenly for no reason it becomes fast again.  I have checked the CPU and the DBSrvr is not using many cycles but I do see the I/O Response creeping up to 150ms but not all the time.

On #3.  If running the query in DBSys, would the query time be affected so it looks like it was a lot slower because of the network or would it state the actual time the query took even if it takes ages to arrive at the client.

I am confused a bit by the query plan, when I have 2 'where' clauses that are 27k and 35k records on the same table, does this mean DBISam has to do 2 queries to get the result or does it filter the first condition which has 27k records then filter those results down to get the last condition?  What is confusing me is the number of records the query plans shows, is this just information or does DBISam have to process this?

cheers
David.
Roy Lambert wrote:

David


Looking at the execution plan, and considering your comments I don't think working on this query is going to deliver the results you want. Something else is going to be causing the slowdown.

Its a canned result so it could be that occasionally, just cos it feels like it, your AV is checking the written file and slowing things down.

If it runs slowly on both your machine and the c/s server then look for common things - eg both running the same AV, if not look for the differences

Other possibilities that cross my mind are:

1. some other query / action accessing the same tables and causing a lock and this query having to wait until the lock is released
2. some other process on the machine putting a heavy cpu load on the PC
3. if it just happens on the c/s machine it could be a perceived slowdown caused by network traffic (ie the query is as fast as ever but the results can't make it over the wire)

Roy Lambert
Fri, Oct 20 2017 6:33 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

David

I just want to clarify since I may have misunderstood your original post. Does the slowdown occur both locally and on the server or just on the server?

ANother question - what is the slowdown. The plan you posted shows 0.031 seconds what's the slow speed?


>#1 is an option as other clients will be accessing the tables, but the slowness can last a couple of hours or more then suddenly for no reason it becomes fast again. I have checked the CPU and the DBSrvr is not using many cycles but I do see the I/O Response creeping up to 150ms but not all the time.

If its just on the server this sounds like something on there is being kicked off, and that could be anything from Windows Update to defreg to using it to make a cup of tea.

>On #3. If running the query in DBSys, would the query time be affected so it looks like it was a lot slower because of the network or would it state the actual time the query took even if it takes ages to arrive at the client.

My understanding is that the time reported is the execution time so even if the network went off on holiday when you got the result it would still look as fast.

>I am confused a bit by the query plan, when I have 2 'where' clauses that are 27k and 35k records on the same table, does this mean DBISam has to do 2 queries to get the result or does it filter the first condition which has 27k records then filter those results down to get the last condition? What is confusing me is the number of records the query plans shows, is this just information or does DBISam have to process this?

Its not something I ever considered. Generally, when I'm trying to speed up a query, all I do is look to make each component part optimised. I can't influence how Tim glues them together so I don't worry about it.
Fri, Oct 20 2017 7:15 PMPermanent Link

David

Hi Roy.

The query plan I posted was a local copy, sorry I should have posted one from the server when it was going slow but I was more concerned by the query plan reporting such a high number of rows for the where conditions that I thought I would post to get some feedback.  

Having read up on it, I think that the way DBISam works is it does each component part as a bitmap then joins them together, so we were both correct but I am sure Tim could confirm this thinking.  

By using parenthesis it changes the logic so hopefully could significantly reduce the I/O on the server which may be causing my slow down.   The issue has only come to light recently because the tables are getting much larger over the years the system has been running, so if our thinking is correct, it causes it to now have to deal with a much larger dataset than it previously did.

This is my fault as my thinking at the time was that if you filter on the driver table and get eight row, then the join only needs to work on eight joins,  This is indeed the end result, but I think DBISam has to do the filter on the joined table first due to the where condition, and if it has not been setup correctly could result in a large number or rows being returned that will ultimately be ignored.

I do notice on my server that the system does work OK providing it is not having to do a lot of other work as it is a file server as well and also has to deal with SCCM, virus scanner and we now have a new age virus scanner,  Sentinel On.  Although none of this causes high CPU loads, it does have a lot of disk activity and I wondered if this has had an impact on disk caching?  Our server is a DELL T310 that has a hardware raid that unfortunately does not have a  battery backed up cache, so hardware cache is turned off by default, which results in slow disk writes which can also have an impact.

Regards
David.
Sat, Oct 21 2017 4:24 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

David

>By using parenthesis it changes the logic so hopefully could significantly reduce the I/O on the server which may be causing my slow down. The issue has only come to light recently because the tables are getting much larger over the years the system has been running, so if our thinking is correct, it causes it to now have to deal with a much larger dataset than it previously did.

Whilst bigger tables do mean more disk I/O the whol;e point of c/s is that its local disk I/O and its only when the result is transfered that network I/O comes into it so the actual query times should be about the same either on a local machine or a server.

>This is my fault as my thinking at the time was that if you filter on the driver table and get eight row, then the join only needs to work on eight joins, This is indeed the end result, but I think DBISam has to do the filter on the joined table first due to the where condition, and if it has not been setup correctly could result in a large number or rows being returned that will ultimately be ignored.

Yup

>I do notice on my server that the system does work OK providing it is not having to do a lot of other work as it is a file server as well and also has to deal with SCCM, virus scanner and we now have a new age virus scanner, Sentinel On. Although none of this causes high CPU loads, it does have a lot of disk activity and I wondered if this has had an impact on disk caching? Our server is a DELL T310 that has a hardware raid that unfortunately does not have a battery backed up cache, so hardware cache is turned off by default, which results in slow disk writes which can also have an impact.

OK now you've identified the culprit! Its why its usually recommended that servers are single purpose machines. Also RAID more secure but slower

Roy
Mon, Oct 23 2017 12:54 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

David,

<< I have two tables involved that are joined together and the where clause filters the master table first resulting in the 8 rows, then I also filter on the joined table with 3 separate clauses.  In the query plan it says that the first clause on the joined table contains 71 rows, the second condition has 27694 and the third has 35878 that are done before any joins are made.  does this mean because of my Where clause the DBISAM server has to do 3 filters on the table then joins them together to get the result set? ??

Effectively, yes, that is what it's doing.  It's not quite as bad as it sounds, but DBISAM does need to build 3 different bitmaps using index scans and then AND them together.  There aren't any row reads involved.  However, as you found out in your later post, sometimes you *want* the database engine to only use the index for the more selective condition and then just read the rows for the rest of the conditions.  DBISAM is just "okay" at performing such optimizations, whereas ElevateDB is much, much better at this, going to great lengths to optimize for such conditions in order to reduce the amount of I/O required.

If you want to send me your database tables (.zip please) along with the query/queries that you're trying to execute, I can see if there's any room for improvement in DBISAM's optimizer that can be done quickly in a new build.

Tim Young
Elevate Software
www.elevatesoft.com
Mon, Oct 23 2017 1:03 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

David,

<< I do notice on my server that the system does work OK providing it is not having to do a lot of other work as it is a file server as well and also has to deal with SCCM, virus scanner and we now have a new age virus scanner,  Sentinel On.  Although none of this causes high CPU loads, it does have a lot of disk activity and I wondered if this has had an impact on disk caching?  Our server is a DELL T310 that has a hardware raid that unfortunately does not have a  battery backed up cache, so hardware cache is turned off by default, which results in slow disk writes which can also have an impact. >>

It's almost unheard of a database engine becoming CPU-bound with queries (it's the ultimate goal, but given the fact that the database engine/server has to rely on disk access, it's almost never achieved unless the entire database is in memory), so disk performance is always the first place to look when having random slowdowns.  This is especially true if the writes are causing other I/O to increase the disk queue length:

https://blogs.technet.microsoft.com/askcore/2012/03/16/windows-performance-monitor-disk-counters-explained/

and a must read afterward (linked in the above article):

http://archive.oreilly.com/pub/a/network/2002/01/18/diskperf.html

Tim Young
Elevate Software
www.elevatesoft.com
Tue, Oct 24 2017 1:47 PMPermanent Link

David

Hi Tim, thanks for getting back to me.

Is it possible that large queries that have lots of rows and high number of bytes to produce the result set gets slower across a day due to the memory manager possibly fragmentation?

What I have seen is that the first couple of users seem to work as expected but throughout the course of the day as more users log on, run different but similar queries, the response speed seems to get slower.  Reducing the size of the bitmaps by putting parenthesis around like conditions seems so far to have helped but it is difficult to tell as the CPU usage is not noticeably higher.

Thanks for the offer to look into making a new DBISam build.  I and not in a position to use a new build at the moment, so it would not be fair asking you to look into a new build if I am not going to use it straight away.  If your looking for your own curiosity as well,  then I can happily send over the tables once any client data has been anonymised.

This situation has made me re-think my use of the where clause and in a lot of cases I am considering adding in new fields to provide a compound index that I think will help with reducing the number or returned rows using one single index rather than relying on multiple conditions.

Regards
David.
Wed, Oct 25 2017 11:29 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

David,

<< Is it possible that large queries that have lots of rows and high number of bytes to produce the result set gets slower across a day due to the memory manager possibly fragmentation? >>

While some memory fragmentation is always possible, I don't think you're going to see fragmentation with FastMM4 to the degree that would affect performance unless you're seriously pressuring the total amount of address space for the process.  For example, our web server/mail server here, as well as the database server, all use DBISAM and have up times in the several months before they get restarted as part of normal updates, etc.

<< What I have seen is that the first couple of users seem to work as expected but throughout the course of the day as more users log on, run different but similar queries, the response speed seems to get slower.  Reducing the size of the bitmaps by putting parenthesis around like conditions seems so far to have helped but it is difficult to tell as the CPU usage is not noticeably higher. >>

My guess here would be that there's a disk response issue that is separate from DBISAM.  With typical workloads, the Windows file system ends up caching most of the database table files and keeps them cached, so disk response isn't that much of a big deal.  However, if the server machine is being heavily used for file server purposes also, then Windows may end up continually ejecting portions of the cached database table files in favor of unrelated files, thus requiring additional physical disk reads.  Combine this with a slow disk write response time, and you'll start to see processing pile up behind the disk.  A good solution would be to attempt to increase the total memory on the server machine in order to provide for memory for the file system caching.  This utility will tell you what the situation is with the memory on the server:

https://docs.microsoft.com/en-us/sysinternals/downloads/rammap

If the amount of available memory is very low at most times throughout the day, then increasing the physical memory in the server machine will definitely help.

<< This situation has made me re-think my use of the where clause and in a lot of cases I am considering adding in new fields to provide a compound index that I think will help with reducing the number or returned rows using one single index rather than relying on multiple conditions. >>

I wouldn't do that.  DBISAM *cannot* use compound indexes for query optimization:

https://www.elevatesoft.com/manual?action=viewtopic&id=dbisam4&product=rsdelphiwin32&version=10T&topic=Optimizations

(under "Index Selection", #1)

What you're seeing when DBISAM exhibits better performance when conditions are grouped is related to how DBISAM's query optimizer is able to determine that it's a better I/O option to only use one index and then just read the records directly to satisfy the remaining conditions.  In other words, it involves DBISAM *ignoring* available indexes, not using compound indexes.

Tim Young
Elevate Software
www.elevatesoft.com
Page 1 of 2Next Page »
Jump to Page:  1 2
Image