Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 20 of 20 total
Thread tedbquery and "select * from table"
Tue, Mar 5 2019 8:11 AMPermanent Link

Greg Hallam

Microcalm Solutions Inc

Roy Lambert wrote:

My bet would be on not having indices for the filters you were setting. Insensitive result set c75k records to scan, sensitive - now many? It might be interesting to just apply the filters to table as a query in EDBManager and see what it tells you/

All the result sets where 75000 records until the filter was applied.  The results were identical aside from performance.  For sure there were no indices as the filter was looking for strings anywhere in the text ie. pos('brick', prodno+description) > 0.  As far as I know that is not something I can index.  I'm OK with the performance and the example is extreme as most companies using our system don't have 75000 items.  My surprise was the difference in filter performance between the sensitive and insensitive result as I thought they would have been similar given they both have to essentially read everything.

We did try requerying with modified WHERE clause to see if that was better than the filter but was not.  The only real help in my testing was replacing my lazy "SELECT * FROM Product" with selecting only the fields necessary.  In particular there was a CLOB field that wasn't required.  Not surprised, but that change doubled the performance.
Tue, Mar 5 2019 10:35 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Greg

>My bet would be on not having indices for the filters you were setting. Insensitive result set c75k records to scan, sensitive - now many? It might be interesting to just apply the filters to table as a query in EDBManager and see what it tells you/
>
>All the result sets where 75000 records until the filter was applied. The results were identical aside from performance.

Surely the source table has more than 75k records - I don't know without seeing an execution plan how many are scanned.

>For sure there were no indices as the filter was looking for strings anywhere in the text ie. pos('brick', prodno+description) > 0. As far as I know that is not something I can index.

Oh yes you can SmileyYou just have to be very slightly sneaky. This will only work on a sensitive result set but what you do is set up a full text index on prodno and one on description. You'll be surprised just how fast CONTAINS is. I know because I store a lot of stuff in CLOB columns basically as stringlists (eg product types) rather than have additional tables and have to join everything. It has a side benefit of making deletion much simpler - one table only, no worries about referential integrity etc.

You might (don't know because I haven't tried it) be able to set up a single full text index using a computed column joining prodno and description.

Naturally, this will not work on an insensitive result set since that's pretty much a temporary table without indices.

Roy
Thu, Mar 14 2019 12:00 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Greg,

<< I had recently done some testing on one of our applications relating to this.  We had a query that was selecting a subset of our product file that would take about 3 seconds as it was insensitive and the result was about 75,000 records.  If I changed the query to sensitive the result was about 0.5 seconds.  The problem we ran into was there were subsequent filters that were set on the result.  The insensitive results would filter quickly (< 1 second) but the sensitive result was slow at about 2 seconds for the same result.  Ultimately we had better overall useability with the insensitive result set. >>

You should be able to get an idea of what's going on by:

1) Putting the original, sensitive query into an EDB SQL window, but making it be a derived table:

SELECT * FROM customer WHERE..

becomes

SELECT * FROM (SELECT * FROM customer WHERE ….)

2) Adding the filter condition as a WHERE condition on the outer query:

SELECT * FROM (SELECT * FROM customer WHERE ….) WHERE...

3) Specify that you want an execution plan, and then execute the query.

Tim Young
Elevate Software
www.elevatesoft.com
Sun, Mar 17 2019 2:24 PMPermanent Link

Greg Hallam

Microcalm Solutions Inc

The first query would be something like:
SELECT X.* FROM (SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1) X
ORDER BY Description

Then it would be followed up by queries like as they enter information to filter:
SELECT X.* FROM (SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1) X
WHERE description LIKE '%PVC%red%'
ORDER BY Description

Here is the plan executed on a store with 2915 products:
================================================================================
SQL Query (Executed by ElevateDB 2.30 Build 4)

Note: The SQL  shown here is generated by ElevateDB and may not be exactly the
same as the SQL that was originally entered.  However, none of the differences
alter the execution results in any way.
================================================================================

SELECT ALL
"X"."ID" AS "ID",
"X"."Description" AS "Description",
"X"."Units" AS "Units",
"X"."Price" AS "Price",
"X"."Cost" AS "Cost"
FROM (SELECT ALL "ID" AS "ID", "Description" AS "Description", "Units" AS
"Units", "Price" AS "Price", "Cost" AS "Cost" FROM "product" WHERE "storeID" =
1) AS "X"
WHERE "description" LIKE '%PVC%red%'
ORDER BY "X"."Description"

Source Tables
-------------

X1 (X): 2915 rows

Result Set
----------

The result set was insensitive and read-only
The result set consisted of zero or more rows

Filtering
---------

The following filter condition was applied to the X1 (X) table:

"description" LIKE '%PVC%red%'

Index scan (Product.DESCRIPTION): 79189 keys, 12.13MB estimated cost


Result set I/O statistics
-------------------------

Total rows visited: 1

Row buffer manager

Max buffer size: 1023.91KB Buffer size: 168B

Hits: 1   Misses: 0   Hit ratio: 1

Reads: 0   read: 0B

Writes: 0   written: 0B

Index Page buffer manager

Max buffer size: 2MB Buffer size: 8KB

Hits: 1   Misses: 0   Hit ratio: 1

Reads: 0   read: 0B

Writes: 0   written: 0B

================================================================================
1 row(s) returned in 0.64 secs
================================================================================

I tried it both sensitive and insensitive, and choosing stores with large (75K) amount of products and small (3K) products.  The part that concerns me is "Index scan (Product.DESCRIPTION): 79189 keys, 12.13MB estimated cost" when the original search only had 2915 records.  It is the same for the 75000 record store.  Clearly it is doing the subsequent filter on the Description Key for the whole table.  This is a big problem in the future as the product file will grow with the products of 100 or more stores using the system.

Currently I do a insensitive query and follow it up with filters of the query result which isolates the filter to just one store's product list.  Still my best solution for now I think . . .
Mon, Mar 18 2019 3:27 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Greg


I'm not 100% certain of just what you're saying or asking here. However, I do have a comment to make - you're using LIKE with multiple wildcards.

LIKE is a very useful operator BUT it works by applying a sliding window and this can be slow. I first encountered the issue when searching CLOBs. My solution was to use POS instead. Try replacing WHERE description LIKE '%PVC%red%' with

WHERE POS('PVC',description) >0 AND POS('red',description)>0


This bit

Index scan (Product.DESCRIPTION): 79189 keys, 12.13MB estimated cost

I have no idea about but I would suspect its trying to work out the impact of LIKE with three wildcards


Roy Lambert
Mon, Mar 18 2019 11:36 AMPermanent Link

Greg Hallam

Microcalm Solutions Inc

I have used and coincidentally am currently using POS in the production version as we speak.  I don't really see a difference as in both cases every field in my subset of the file has to be evaluated and the evaluation of multiple wildcards in a LIKE and single or multiple POS statements are not appreciably different in performance based on my tests and experience.

My issue is that the first Query "WHERE storeID=?" is not used for the evaluation of the Description field.  In one case it looked through 79000 records when there were less than 3000 in the store.  This problem only gets worse for me as the file gets bigger with more stores.  I have to assume that the query evaluation believes that it is more efficient to use the Description Index even though there about 76000 extra records in it.  Filtering on an insensitive query result is allowing me to force the opposite decision and do an inefficient exhaustive search through less than 3000 records.
Mon, Mar 18 2019 11:52 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Greg

>I have used and coincidentally am currently using POS in the production version as we speak. I don't really see a difference as in both cases every field in my subset of the file has to be evaluated and the evaluation of multiple wildcards in a LIKE and single or multiple POS statements are not appreciably different in performance based on my tests and experience.

Depends on the size of the field being tested. Small fields (say 60 characters or less) its reasonably efficient.

>My issue is that the first Query "WHERE storeID=?" is not used for the evaluation of the Description field. In one case it looked through 79000 records when there were less than 3000 in the store. This problem only gets worse for me as the file gets bigger with more stores. I have to assume that the query evaluation believes that it is more efficient to use the Description Index even though there about 76000 extra records in it. Filtering on an insensitive query result is allowing me to force the opposite decision and do an inefficient exhaustive search through less than 3000 records.

If you can make a test dataset and the queries you're using available I'm happy to take a look at things and see if I can speed it up. I can't guarantee much because I have a similar one. Again its where I create a table on the fly as you're doing. I need a temporary table for further processing so I don't have an option but you may.

I've just been looking at the query you posted

SELECT X.* FROM (SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1) X
WHERE description LIKE '%PVC%red%'
ORDER BY Description

and I don't understand what you're doing. It may be a special case but

SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1 and description LIKE '%PVC%red%'
ORDER BY Description

should work as well and be a lot faster. Is there a reason for the intermediate table?

Roy

Mon, Mar 18 2019 1:56 PMPermanent Link

Greg Hallam

Microcalm Solutions Inc

Roy Lambert wrote:

<If you can make a test dataset and the queries you're using available I'm happy to take a look at things and see if I can speed it up. I can't guarantee much because I have a similar one. Again its where I create a table on the fly as you're doing. I need a temporary table for further processing so I don't have an option but you may.

I am OK with where I am at in my application right now.  This is not a big enough problem for me to warrant the time to make it faster.  It is more about understanding all the internals and how things operate and somewhat to aid in understanding the original problem in this thread (not mine).

<I've just been looking at the query you posted

SELECT X.* FROM (SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1) X
WHERE description LIKE '%PVC%red%'
ORDER BY Description

and I don't understand what you're doing. It may be a special case but

SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1 and description LIKE '%PVC%red%'
ORDER BY Description

should work as well and be a lot faster. Is there a reason for the intermediate table?

This was a response to Tim's post earlier in an attempt to understand what is going on and possible efficiencies.  He suggested the derived query assuming I understood correctly.  Currently I do an insensitive query:

SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1
ORDER BY Description

subsequent to that I filter using the POS function in the filter that is built.  Performance is decent.  The worst case for the original query was a store with 75000 products that was about 1.5s.  My desire to understand is more about how this will work when we have 100 stores operating in the database as there is no issue with the 4 currently.  I am trying to get my head around how this will all scale.  I believe the way I am doing it right now scales fine.  If I change to the sensitive query on its own I believe that it may be better now but much worse later.

Greg

Tue, Mar 19 2019 3:24 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Greg


>This was a response to Tim's post earlier in an attempt to understand what is going on and possible efficiencies. He suggested the derived query assuming I understood correctly. Currently I do an insensitive query:

My reading of that post is that it was purely as a diagnostic tool in the same way that I split queries apart so I can get a meaningful execution plan and tune each bit. Sometimes it just doesn't help though Frown

>SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1
>ORDER BY Description
>
>subsequent to that I filter using the POS function in the filter that is built. Performance is decent. The worst case for the original query was a store with 75000 products that was about 1.5s. My desire to understand is more about how this will work when we have 100 stores operating in the database as there is no issue with the 4 currently. I am trying to get my head around how this will all scale. I believe the way I am doing it right now scales fine. If I change to the sensitive query on its own I believe that it may be better now but much worse later.

Without testing my gut reaction is that it would be the other way, when I start thinking about it my head hurts and I can't make my mind up. From what you've posted so far my brain goes - insensitive result set = temporary table on disk and hence operations carried out on physical subset of table, but it also says sensitive result set just applies filters to index should be faster

If you have machines with enough RAM in them you may want to consider using temporary in-memory tables. Its easy enough


Roy
Wed, Mar 20 2019 3:51 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Greg,

<< The first query would be something like:
SELECT X.* FROM (SELECT ID, Description, Units, Price, Cost FROM product WHERE storeID=1) X
ORDER BY Description >>

Is there a particular reason that you're using a derived table instead of just referencing the table directly ?  That will resolve this issue for you.

The main issue is that EDB cannot "add" filter conditions to a derived table.  Derived tables act like temporary views, so they are effectively "sealed" and cannot take advantage of conditions that are *outside* of their own context.

Tim Young
Elevate Software
www.elevatesoft.com
« Previous PagePage 2 of 2
Jump to Page:  1 2
Image