Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 20 of 20 total
Thread Filter a query
Sat, Feb 6 2021 7:11 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

David


<<Result set will be ordered by the following column(s) using a case-sensitive
temporary index:

Company ASC
ContractNumber ASC>>


This will always be a problem for you. For the query to be optimised it needs a compound index not two separate indices (unless things have changed since I used DBISAM) and its impossible to achieve that since the two columns come from separate tables.

Roy Lambert
Sat, Feb 6 2021 7:16 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

David


For the WHERE clause you want separate indices. Unless things have changed my memory says only the first part of a compound index is used in the WHERE clause.

<<How would I go about having one index for this table using StudyCompDate,Department and StudyStatus, but on another query I might only use StudyCompDate only?>>

You can create as many indices as you like - DBISAM will (I think) select the ones it considers are the best for the query. I don't know of any way to tell it which ones to use.

It might be worth opening a support ticket with Tim for your specific cases.

Roy Lambert
Sat, Feb 6 2021 9:12 AMPermanent Link

David

Hi Roy.

I can sort of get this working with an optimized query if I re-work it and drop the <> for an = like Jose said.

However I am seeing in some instances the costs increasing at the expense of it being an optimized query.  Now the question comes down to, is an optimized query with higher costs better than an partially optimized query with lower costs.

I am not seeing a much of a difference in query execution time regardless, but this is being done locally with only myself logged on using an SSD where as the live system uses spinning disks on a multi user server.

For example:

Where StudyCompDate = null and Department ='Dept1' and StudyStatus = 'Active' or StudyStatus = 'OnHold'

is optimized but costs are 1922841 bytes

Where StudyCompDate = null and Department ='Dept1' and (StudyStatus = 'Active' or StudyStatus = 'OnHold')

is partially optimized but costs are 1259085  bytes

I do not understand the impact if any the costs have on the server, is it persistent inrease in memory used while the query is open or just a transient cost while generating the result set and once obtained not an issue?
Sat, Feb 6 2021 11:58 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

David


Its a long time since I spoke fluent DBISAM but from fallible memory the costs are an estimate of the data that will be read and compared to execute the query. Having said that you are comparing oranges and apples (I was going to say chalk and cheese but its more similar than that) - the two queries are different.

StudyCompDate = null and Department ='Dept1' and StudyStatus = 'Active' or StudyStatus = 'OnHold'
StudyCompDate = null and Department ='Dept1' and (StudyStatus = 'Active' or StudyStatus = 'OnHold')

Why one is optimised and the other isn't I don't know. My initial reaction would have been that the second should have a higher cost since (my guess) an intermediate result has to be produced first but thinking about it the intermediate result may mean less compares for the other two tests. I can't remember just how the two expressions are computed so could be talking absolute gibberish.

If you see "Result set will be canned" it means a temporary dataset is written to disk (or SSD) the time of doing that can be significant and pretty much hide any differences in the underlying query execution. If you can get a live dataset it should be much faster.

Roy Lambert
Mon, Feb 8 2021 6:24 AMPermanent Link

David

Hmm interesting....

I think I may have found my issue.  By doing a text search for 'Cancelled' in the filter I am increasing the number of bytes to do the search, even if I modify it to use = as opposed to <>.

However if I  add a new boolean field and index for Cancelled this allows the query to be optimized and the byte cost is significantly smaller than before Smile  I never realised that searching for text with an index would have such an impact on the query.

StudyCompDate = null and StudyStatus = 'Active' or StudyStatus = 'OnHold' and
Department = 'Dept1'

is PARTIALLY-OPTIMIZED, covers 4776 rows or index keys, costs 3068190 bytes,
and will be applied to the Scheduling table (a) before any joins

To this

StudyCompDate = null and Cancelled = false and Department = 'Dept1'

has been rewritten and is OPTIMIZED, covers 2003 rows or index keys, costs
448476 bytes, and will be applied to the Scheduling table (a) before any joins
Mon, Feb 8 2021 6:24 AMPermanent Link

David

Hmm interesting....

I think I may have found my issue.  By doing a text search for 'Cancelled' in the filter I am increasing the number of bytes to do the search, even if I modify it to use = as opposed to <>.

However if I  add a new boolean field and index for Cancelled this allows the query to be optimized and the byte cost is significantly smaller than before Smile  I never realised that searching for text with an index would have such an impact on the query.

StudyCompDate = null and StudyStatus = 'Active' or StudyStatus = 'OnHold' and
Department = 'Dept1'

is PARTIALLY-OPTIMIZED, covers 4776 rows or index keys, costs 3068190 bytes,
and will be applied to the Scheduling table (a) before any joins

To this

StudyCompDate = null and Cancelled = false and Department = 'Dept1'

has been rewritten and is OPTIMIZED, covers 2003 rows or index keys, costs
448476 bytes, and will be applied to the Scheduling table (a) before any joins
Mon, Feb 8 2021 7:59 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

David


Firstly you're replacing

StudyStatus = 'Active' or StudyStatus = 'OnHold'

with

Cancelled = false

so you are reducing the number of bytes (or possibly even bits)  that have to be examined.  Its even less than you might expect because you're getting rid of the "or" which guarantees evaluation of both sides of the operator.

Since I don't know just how DBISAM evaluates "and" and "or" I can't say for definite but I suspect that you're actually changing the meaning of the test. I think that DBISAM may well test as

(StudyCompDate = null and StudyStatus = 'Active') or (StudyStatus = 'OnHold' and Department = 'Dept1')

whereas (depending on how you set Cancelled) you whould get the equivalent of

StudyCompDate = null and (StudyStatus = 'Active' or StudyStatus = 'OnHold') and Department = 'Dept1'

I'm guessing but its worth eyeballing the result set to make sure

Roy Lambert
Mon, Feb 8 2021 4:08 PMPermanent Link

David

Hi Roy.

I played around with this some more and basically I wanted to use StudyStatus to work as a filter, it can have Active, Cancelled or OnHold and as I had done previously I was using text to describe this status.  However, depending on how many other criteria there is, it doesn't always quite work the way I wanted, with higher costs or partial optimized.

So as I said in my previous message I added a new column that was boolean just to set if the record had been cancelled which seems to work.  I later though maybe I should have a status in this fields, so I am using a small int now. 1 = Active, 2 = onHold and 3= Cancelled.

So if I want all but cancelled I can just do <3, if I want Active only this is  =1 or if I want  OnHold then =2.  This seems to work at the moment, but as you said, better to eyeball it so looking into some of the queries I currently have and seeing if this will work.  Does this make sense?

I had thought that the text in the StudyStatus would do the same thing, and it does, just it costs a lot more than just using an integer to filter on, but not so easy to read though.

What impact does the higher cost have on the database does this make it slower, if you have lots of people doing the same query so lower costs = more responsive DB.  I am not seeing much difference in execution time, but I wonder if in the live system it might have a more positive impact.

Regards
David.
Tue, Feb 9 2021 6:47 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

David

You're somewhat out of my paygrade now, but I'll give my opinion and others with more knowledge can contribute and correct me


>I played around with this some more and basically I wanted to use StudyStatus to work as a filter, it can have Active, Cancelled or OnHold and as I had done previously I was using text to describe this status. However, depending on how many other criteria there is, it doesn't always quite work the way I wanted, with higher costs or partial optimized.

Its the and / or thing. If you want to determine what it does then use brackets.

>So as I said in my previous message I added a new column that was boolean just to set if the record had been cancelled which seems to work. I later though maybe I should have a status in this fields, so I am using a small int now. 1 = Active, 2 = onHold and 3= Cancelled.

That's probably what I'd do, or I'd experiment with using A, H, C

>So if I want all but cancelled I can just do <3, if I want Active only this is =1 or if I want OnHold then =2. This seems to work at the moment, but as you said, better to eyeball it so looking into some of the queries I currently have and seeing if this will work. Does this make sense?

Yup, but use brackets where necessary to enforce how the query is evaluated

>I had thought that the text in the StudyStatus would do the same thing, and it does, just it costs a lot more than just using an integer to filter on, but not so easy to read though.

There will be that, there will also be grouping in the index. Imagine if for all rows l StudyStatus was Active you'd end up reading a lot of index data for nothing. A good spread is much more effective.

>What impact does the higher cost have on the database does this make it slower, if you have lots of people doing the same query so lower costs = more responsive DB. I am not seeing much difference in execution time, but I wonder if in the live system it might have a more positive impact.

If you're using f/s I think you'll find the single biggest impact on performance will be the move from single user to multi user. If its c/s then the performance degradation is more linear.

Roy
Tue, Feb 9 2021 4:54 PMPermanent Link

David

Thanks Roy you have been very helpful. Appreciate it.

David.

Roy Lambert wrote:

David

You're somewhat out of my paygrade now, but I'll give my opinion and others with more knowledge can contribute and correct me
« Previous PagePage 2 of 2
Jump to Page:  1 2
Image