Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 26 total
Thread Optimising "not" clauses in SQL
Thu, Nov 12 2009 6:25 PMPermanent Link

"Adam H."
Hi all,

I've noticed in DBISam, that if I have a query such as:

Select something
from mytable
where myfield = 'C'

The query runs quite fast. (All these examples assume that myfield is
indexed).

However, if I run something like:

Select something
from mytable
where myfield <> 'C'

The query runs substantially slower. (Can be 10 times slower).

But, if I run something like

Select something
from mytable
where (myfield = 'A') or myfield = ('B') or (myfield  = 'D') or (myfield
= 'E').... and so on - the query runs relatively fast again.

The question I have is in order to optimise my query with a 'not'
statement - is there some trick that I'm missing with the indexing of
the field, or am I better to have select statements for the field for
all the possible matches except for the field I don't want (such as my
3rd example) as far as speed and optimization is concerned?

I'm running DBISam 4.29 on Delphi2007 in Windows 7.

Cheers

Adam.

Fri, Nov 13 2009 1:56 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam

How about

Select something
from mytable
where NOT myfield = 'C'

Roy Lambert
Fri, Nov 13 2009 12:42 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Adam,

<< However, if I run something like:

Select something
from mytable
where myfield <> 'C'

The query runs substantially slower. (Can be 10 times slower). >>

I depends upon the number of records returned, but such queries can involve
a lot of index scanning, and in some cases, can get converted to record
scans because the number of records returned is very high.

How many records are in the source table that don't match 'C' ?

--
Tim Young
Elevate Software
www.elevatesoft.com

Sun, Nov 15 2009 7:36 PMPermanent Link

"Adam H."
Good Morning Tim,

Thanks for your reply mate,

> I depends upon the number of records returned, but such queries can involve
> a lot of index scanning, and in some cases, can get converted to record
> scans because the number of records returned is very high.
>
> How many records are in the source table that don't match 'C' ?

Can I give 'LOTS' as an answer?  Wink

At present there are around 56000 records that don't match 'C'.

The table has a record size of 1856 bytes.


The field is indexed, but I'm guessing that this may not make a
difference the way NOT seems to work?

I'm just trying to find a more optimized solution to run my query.
(Hopefully not something too difficult or complex considering the
simplicity of the original SQL). Smile

Cheers mate

Adam.
Sun, Nov 15 2009 7:36 PMPermanent Link

"Adam H."
Hi Roy,

> Select something
> from mytable
> where NOT myfield = 'C'

Thanks for the idea, but unfortunately that seems to take the same
amount of time as using the <>  expression.

Cheers mate

Adam.
Mon, Nov 16 2009 3:35 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Adam,

<< Can I give 'LOTS' as an answer?  Wink>>

Yes, that is valid answer. Smiley

Can you do me a favor and post the execution plan for the query ?  That may
shed some light on what the optimizer is doing.

<<The field is indexed, but I'm guessing that this may not make a difference
the way NOT seems to work? >>

No, it won't make any difference.

--
Tim Young
Elevate Software
www.elevatesoft.com

Mon, Nov 16 2009 10:27 PMPermanent Link

"Adam H."
Hi Tim,

Thanks again for getting back to me.


> Can you do me a favor and post the execution plan for the query ?  That may
> shed some light on what the optimizer is doing.

Absolutely. The plan is:

================================================================================
SQL statement (Executed with 4.29 Build 2)
================================================================================

select T.Ticket, T.ProductID
From Tickets T
where (T.Type <> 'C')

Tables Involved
---------------

Tickets (T) table opened shared, has 65638 rows

Result Set Generation
---------------------

Result set will be canned

Result set will consist of one or more rows

WHERE Clause Execution
----------------------

The expression:

T.Type <> 'C'

is OPTIMIZED, covers 716 rows or index keys, costs 9021 bytes, and will be
applied to the Tickets table (T) before any joins

================================================================================
>>>>> 56284 rows affected in 12.547 seconds
================================================================================




as opposed to:




================================================================================
SQL statement (Executed with 4.29 Build 2)
================================================================================

select T.Ticket, ProductID
From Tickets T
where (T.Type = 'A') or
(T.Type = 'B') or
(T.Type = 'D') or
(T.Type = 'E') or
(T.Type = 'F') or
(T.Type = 'G') or
(T.Type = 'H') or
(T.Type = 'I') or
(T.Type = 'J') or
(T.Type = 'K') or
(T.Type = 'L') or
(T.Type = 'M') or
(T.Type = 'N') or
(T.Type = 'O') or
(T.Type = 'P') or
(T.Type = 'Q') or
(T.Type = 'R') or
(T.Type = 'S') or
(T.Type = 'T') or
(T.Type = 'U') or
(T.Type = 'V') or
(T.Type = 'W') or
(T.Type = 'X') or
(T.Type = 'Y') or
(T.Type = 'Z') OR
(T.Type is null)

Tables Involved
---------------

Tickets (T) table opened shared, has 65638 rows

Result Set Generation
---------------------

Result set will be canned

Result set will consist of one or more rows

WHERE Clause Execution
----------------------

The expression:

T.Type = 'A' or T.Type = 'B' or T.Type = 'D' or T.Type = 'E' or T.Type =
'F' or
T.Type = 'G' or T.Type = 'H' or T.Type = 'I' or T.Type = 'J' or T.Type =
'K' or
T.Type = 'L' or T.Type = 'M' or T.Type = 'N' or T.Type = 'O' or T.Type =
'P' or
T.Type = 'Q' or T.Type = 'R' or T.Type = 'S' or T.Type = 'T' or T.Type =
'U' or
T.Type = 'V' or T.Type = 'W' or T.Type = 'X' or T.Type = 'Y' or T.Type =
'Z' OR
T.Type is null

is OPTIMIZED, covers 56284 rows or index keys, costs 709177 bytes, and
will be
applied to the Tickets table (T) before any joins

================================================================================
>>>>> 56284 rows affected in 2.312 seconds
================================================================================


Cheers mate,

Adam.
Tue, Nov 17 2009 5:59 PMPermanent Link

"Adam H."
Hi Tim,

With further investigation, I've found something else...

When I run the Query with  (T.Type <> 'C') is for the first time, I get
the results as reported, however if I run it again it works much faster
the second time. (Roughly the same amount of time as the query with the
expression (T.Type = 'A') or (T.Type = 'B').... etc

I don't know if this has anything to do with the issue at hand or not.

Cheers

Adam.
Tue, Nov 17 2009 6:18 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Adam,

<< Absolutely. The plan is: >>

Hmm, the execution plan for this needs some further work.  The stats
reported for the WHERE clause don't match the number of resulting records,
which is okay since it is actually reporting the number of matching records
for 'C'.  However, what it isn't reporting is a further record scan that has
to occur.

What the DBISAM optimizer does in these cases is this:

1) Scan all index keys where Type='C', and populate a bitmap with the
corresponding records.
2) Flip the bitmap.
3) Scan all records that match the set bits in the bitmap and make sure that
they aren't deleted.

It is 3) that is most likely taking up the majority of the time, and this is
what the execution plan is leaving out.

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, Nov 18 2009 7:51 PMPermanent Link

"Adam H."
Hi Tim,

Thanks for your reply...

> Hmm, the execution plan for this needs some further work.  The stats
> reported for the WHERE clause don't match the number of resulting records,
> which is okay since it is actually reporting the number of matching records
> for 'C'.  However, what it isn't reporting is a further record scan that has
> to occur.
>
> What the DBISAM optimizer does in these cases is this:
>
> 1) Scan all index keys where Type='C', and populate a bitmap with the
> corresponding records.
> 2) Flip the bitmap.
> 3) Scan all records that match the set bits in the bitmap and make sure that
> they aren't deleted.
>
> It is 3) that is most likely taking up the majority of the time, and this is
> what the execution plan is leaving out.


Thanks for the explanation. However, (just out of curiosity) if we were
to do a Query where Type='C' as opposed to "NOT C", wouldn't step 3
still need to occur in this instance too? (As it's my impression that
deleted records still hold all the data, they're just marked as
'deleted' until that record is overwritten with another one).

Cheers mate

Adam.
Page 1 of 3Next Page »
Jump to Page:  1 2 3
Image