Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 17 total
Thread Worse SQL speed for LIKE
Sat, Mar 17 2012 12:29 PMPermanent Link

Rolf Frei

eicom GmbH

I have now moved some of my larger tables to EDB2 from DBISAM3 and did some
speed tests agianst it. I'm right shocked at the moment, as EDB2 is extremly
slower than DBISAM3. The following simple SQL takes 2.8 seconds in DBSIAM3,
but in EDB it needs 10.8 (!!!) seconds.

Select * from parts
where
LOWER(info) like '%akku%' or
Lower(Description) like '%akku%'
order by description

What is going on here? I tought EDB should be faster?

In EDB2 I have created an unicode DB, may that make that huge difference?
All indexes are the same as in DBISAM3. If this is normal in EDB2, this will
be a show stopper for me, as this like query is right much used in the Web
for database product searching.

Another question: Is there a simple way to convert a EDB Unicode DB to an
ANSI DB?

Regards
Rolf



Sat, Mar 17 2012 1:25 PMPermanent Link

Rolf Frei

eicom GmbH

Some other strange timings:

Select * from parts
where
 Lower(Description) like '%akku%' or
 Lower(Description) like '%asus%' or
 Lower(Description) like '%ati%'
order by description

DBSIAM3: 0.5s
EDB2: 1.2s  (2.5x slower)

I did now create a Textindex on the Infofield and get this results:

Select * from parts
where
 Info contains 'akku' or
 Info contains 'asus' or
 Info contains 'ati'
order by description

DBSIAM3  not tested
EDB2 0.15s  (these seems to be acceptable)

But now it gets very strange on combining this two queries:

Select * from parts
where
 Info contains 'akku' or
 Info contains 'asus' or
 Info contains 'ati' or
 Lower(Description) like '%akku%' or
 Lower(Description) like '%asus%' or
 Lower(Description) like '%ati%'
order by description

Here I expect to get something about 1.0-1.5 sec but this SQL needs 14.2
s(!!!). What the hell is going on here?

Regards
Rolf

Sun, Mar 18 2012 5:09 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Rolf


First you don't need the Lower() any more to make things case insensitive. This is now achieved via collations.

Second is that collations can really mess you up if you don't "appreciate" them.

Is the index you think you're using the same collation as the column you're testing? eg the column may be ANSI_CI and the index ANSI. If that's the case it won't be used.

There have been many posts which seem to indicate that DBISAM V3 was the fastest. I know Tim has explained a number of times what was changed (memory says something about buffers).

Roy Lambert
Mon, Mar 19 2012 4:01 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Rolf


A post over in the DBISAM section jogged my memory for another issue - are your tables encrypted. If so that will be part of the answer. I don't know how big a part though. DBISAM V3 used an XOR encryption method simple, fast and easy to break. ElevateDB uses a much more secure method but that will come at a bit of a performance penalty. Nowhere as big as you're quoting which is much more likely to be an index / collation problem.

Roy Lambert [Team Elevate]
Mon, Mar 19 2012 11:48 AMPermanent Link

Rolf Frei

eicom GmbH

Roy

In DBSIAM 3 both fields (Descripton VARCHAR(100) and Info CBLOB) doesn't
have an index at all. It's a simple sequential rowscan what is done. The
first test in EDB was with no indexes as well. So it did a rowscan also, but
much, much slower.

As of the Lower(): I need it as the collation isn't and should not be CI.

This query returns worse result:

Select * from parts
where
 LOWER(info) like '%akku%' or
 LOWER(info) like '%asus%' or
 LOWER(info) like '%ati%'
 LOWER(description) like '%akku%' or
 LOWER(description) like '%asus%' or
 LOWER(description) like '%ati%'
order by description

EDB and DBISAM have no indexes on this fields. The Collation is DES (Swiss
german with no CI). There are a total of 11'000 records and this query
returns 3600 records of them.
DBISAM: 4.5 Second
EDB2: 23.5 Seconds

I was able to reduce the time by adding a textindex and using "contains",
but so I will not anymore get the same results back. In german there are
many concatinated words like "Buchtitel" which are two words on english
"book title". As such, any searches for "Buch" will not anymore find what I
expect. As of this problem I don't want to use Textindexes. In my case the
search with "like" returns 3600 records, where "conatins" only returns 600.

I realy wonder what is that different to DBISAM that it takes about 5x more
time to get the same results in EDB?! At the moment I'm absolutly
frustrated, as this will make EDB2 ununsable for me. I can live with about
10% speed degration but by far not with 520% as in my sample. There must be
something wrong in the engine für LIKE conditions.

It looks to me as in EDB every single LIKE makes a complete table rowscan.
So in this case it scans the complete table 6 times. This isn't needed at
all and if it is realy the case, there is a big probem in EDB than, which
Tim should looking at. But I cant realy believe that it is realy done this
way.

Regards
Rolf


"Roy Lambert"  schrieb im Newsbeitrag
news:2B842CBC-8BAE-4B23-AA21-D3061C28C68E@news.elevatesoft.com...

Rolf


First you don't need the Lower() any more to make things case insensitive.
This is now achieved via collations.

Second is that collations can really mess you up if you don't "appreciate"
them.

Is the index you think you're using the same collation as the column you're
testing? eg the column may be ANSI_CI and the index ANSI. If that's the case
it won't be used.

There have been many posts which seem to indicate that DBISAM V3 was the
fastest. I know Tim has explained a number of times what was changed (memory
says something about buffers).

Roy Lambert
Mon, Mar 19 2012 11:52 AMPermanent Link

Rolf Frei

eicom GmbH

Roy

The table isn't encrypted, not in DBISAM and not in EDB2. So this can not be
the reason. I have no also tried to run the queries against an ANSI EDB2
table, but the result is near the same. The ANSI EDB" Version is a little
bit faster then den Unicode Version of EDB.

Regards
Rolf


"Roy Lambert"  schrieb im Newsbeitrag
news:8F489494-30E3-4FB8-A7A8-52956854158E@news.elevatesoft.com...

Rolf


A post over in the DBISAM section jogged my memory for another issue - are
your tables encrypted. If so that will be part of the answer. I don't know
how big a part though. DBISAM V3 used an XOR encryption method simple, fast
and easy to break. ElevateDB uses a much more secure method but that will
come at a bit of a performance penalty. Nowhere as big as you're quoting
which is much more likely to be an index / collation problem.

Roy Lambert [Team Elevate]
Mon, Mar 19 2012 12:23 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Rolf

>As of the Lower(): I need it as the collation isn't and should not be CI.

You still don't need it Smiley The approach is to use collate as part of the query eg

description COLLATE DES_CI LIKE '%asus%

I now know you're using ORDER BY against an unindexed column. This will also slow things down, and it will (I think) force the production of an insensitive (ie canned) result set. If DBISAM V3 was producing a Sensitive (live) result set that could go a long way towards explaining the timing differences.

LOWER() will work as well but using COLLATE will give you added flexibility.

>I was able to reduce the time by adding a textindex and using "contains",
>but so I will not anymore get the same results back. In german there are
>many concatinated words like "Buchtitel" which are two words on english
>"book title". As such, any searches for "Buch" will not anymore find what I
>expect. As of this problem I don't want to use Textindexes. In my case the
>search with "like" returns 3600 records, where "conatins" only returns 600.

what happens if you try CONTAINS 'buch*'

>I realy wonder what is that different to DBISAM that it takes about 5x more
>time to get the same results in EDB?! At the moment I'm absolutly
>frustrated, as this will make EDB2 ununsable for me. I can live with about
>10% speed degration but by far not with 520% as in my sample. There must be
>something wrong in the engine für LIKE conditions.
>
>It looks to me as in EDB every single LIKE makes a complete table rowscan.
>So in this case it scans the complete table 6 times. This isn't needed at
>all and if it is realy the case, there is a big probem in EDB than, which
>Tim should looking at. But I cant realy believe that it is realy done this
>way.

This I can't comment on. My suggestion at this point is to email Tim direct - he'll probably want a copy of the table(s) and associated config file.

Roy Lambert [Team Elevate]
Mon, Mar 19 2012 12:58 PMPermanent Link

Rolf Frei

eicom GmbH

Roy

>> description COLLATE DES_CI LIKE '%asus%

Changed to this type of SQL, but still the same timings. Frown


>>I now know you're using ORDER BY against an unindexed column. This will
>>also slow things down, and it will (I think) force the production of an
>>insensitive (ie canned) result set. If DBISAM V3 was producing a Sensitive
>>(live) result set that could go a long way towards explaining the timing
>>differences.

Removing the Order by reduces the time used from 23.5 to 22.8 seconds. As
there is no index on description, in DBSIAM3 this query retruns a canned
result set as in EDB2. DBISAM needs 4.0 instead 4.5 with the order by.


>> what happens if you try CONTAINS 'buch*'

This will only return words beginning with "buch", but the search word
should also find words ending with it or inside another longer word.


>> This I can't comment on. My suggestion at this point is to email Tim
>> direct - he'll probably want a copy of the table(s) and associated config
>> file.

I have just sent an email. Thanks anyway for your efforts!


Regards
Rolf


Wed, Mar 21 2012 2:54 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Rolf,

<< In DBSIAM 3 both fields (Descripton VARCHAR(100) and Info CBLOB) doesn't
have an index at all. It's a simple sequential rowscan what is done. The
first test in EDB was with no indexes as well. So it did a rowscan also, but
much, much slower. >>

As indicated via email, the issue is with the number of string comparisons
in the LIKE functionality.  EDB has to make a Windows API call to compare
characters in a collation-sensitive way, and that is accounting for the
majority of the execution time.  The Windows API call is made around 83
million times in the process of executing that query, as many of the rows
have several KB of text that must be compared character-by-character.  Also,
as I said in my email, there really is no way around this at this time, but
I'll keep digging to see if the number of calls can be reduced by modifying
the logic of the LIKE functionality.

In general, though, there are some distinct differences in the way that
EDB's LIKE functionality works when compared to DBISAM.  Most notably, EDB
does some extra comparisons to account for certain collations like German
that may match on more than one character.  This can cause additional
overhead, and in the case of your query, it accounts for 15 additional
seconds of overhead (out of 35 seconds total execution time).

<< As of the Lower(): I need it as the collation isn't and should not be CI.
>>

What you're doing is essentially a case-insensitive operation so, no, you do
*not* need the LOWER() call and can use a simple COLLATE specification to
achieve the same thing.

<< It looks to me as in EDB every single LIKE makes a complete table
rowscan.  So in this case it scans the complete table 6 times. >>

This is not correct.  It makes one scan through the whole table, and
evaluates each row for all of the conditions specified.  The query execution
plan will tell you this information.

--
Tim Young
Elevate Software
www.elevatesoft.com
Wed, Mar 21 2012 4:01 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Rolf


Just thinking about this again having read Tim's answer. Have you tried


Select * from parts
where
 Info contains '*akku*' or
 Info contains '*asus*' or
 Info contains '*ati*'
order by description

I think ElevateDB will do that these days.

Roy Lambert [Team Elevate]
Page 1 of 2Next Page »
Jump to Page:  1 2
Image