Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 23 total
Thread Query slow
Fri, Aug 8 2008 3:35 PMPermanent Link

Danny Humphress
Hi,

I have a table that contains two memo fields. One has a full text index and the other does not. The following query should be fast but it is quite
slow:

SELECT *
FROM InvMarc
WHERE
(
(TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%'))
OR
TEXTSEARCH('DISNEYLAND' IN Keywords)
)

I know that the LIKE operator won't be optimized but that part of the query should be partially optimized and, in fact, it does run quickly when I
just use the following WHERE clause by itself:

(TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%'))

The query is slow only when the two are OR'd together. I could break this into two queries and UNION them, but this SQL is built on-the-fly based
on a user query so there could be all kinds of combinations.

In case you're wondering, the reason for doing a TEXTSEARCH and a LIKE search for the same term in two different fields is that I am looking for
an exact phrase "DISNEY WORLD" and using the TEXTSEARCH to narrow down the list of candidates. Keywords and Contents are two different
fields because Contents has a industry-specific record formatting and Keywords contains only specific words from the Contents field.

Thanks,
Danny
Sat, Aug 9 2008 9:36 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Danny


I just tried running an OR'd TEXTSEARCH here (no LIKE) and the timing was not quite double. Can you run the query in DBSys and post the execution plan?

I don't know wether it would be usable in your case but I had a similar requirement to look for phrases rather than words and what I did was to replace #32 with #160 (space with "hard space"). They are visually identical which for me was important but since #160 isn't a separator character the phrase (in my case product names) are treated as one word. It works a treat unless you want to search for DISNEY and WORLD separately. The other approach I've seen suggested is to have DISNEY, WORLD and DISNEYWORLD all in the index.

Roy Lambert [Team Elevate]
Sat, Aug 9 2008 2:08 PMPermanent Link

Danny Humphress
Roy,

Here is the plan for the problem search:

================================================================================
SQL statement (Executed with 4.26 Build 3)
================================================================================

SELECT *
FROM InvMarc
WHERE
(
(TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE
'%DISNEY WORLD%'))
OR
TEXTSEARCH('DISNEYLAND' IN Keywords)
)

Tables Involved
---------------

InvMarc (InvMarc) table opened shared, has 9333 rows

Result Set Generation
---------------------

Result set will be canned

Result set will consist of one or more rows

WHERE Clause Execution
----------------------

NOTE: Optimized TEXTSEARCH() conditions do not use statistics and will show
zero (0) for any I/O costs

The expression:

TEXTSEARCH('DISNEY WORLD',Keywords) = TRUE AND UPPER(Contents) LIKE
'%DISNEY WORLD%' OR TEXTSEARCH('DISNEYLAND',Keywords) = TRUE

is PARTIALLY-OPTIMIZED, covers 0 rows or index keys, costs 0 bytes, and will be
applied to the InvMarc table (InvMarc) before any joins

================================================================================
>>>>> 1 rows affected in 17.488 seconds
================================================================================

...and here is the plan for the search without an OR clause:

================================================================================
SQL statement (Executed with 4.26 Build 3)
================================================================================

SELECT *
FROM InvMarc
WHERE
(
(TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE
'%DISNEY WORLD%'))
)

Tables Involved
---------------

InvMarc (InvMarc) table opened shared, has 9333 rows

Result Set Generation
---------------------

Result set will be canned

Result set will consist of one or more rows

WHERE Clause Execution
----------------------

NOTE: Optimized TEXTSEARCH() conditions do not use statistics and will show
zero (0) for any I/O costs

The expression:

TEXTSEARCH('DISNEY WORLD',Keywords) = TRUE AND UPPER(Contents) LIKE '%DISNEY WORLD%'

is PARTIALLY-OPTIMIZED, covers 0 rows or index keys, costs 0 bytes, and will be
applied to the InvMarc table (InvMarc) before any joins

================================================================================
>>>>> 0 rows affected in 0.031 seconds
================================================================================

...and the other one without OR:

================================================================================
SQL statement (Executed with 4.26 Build 3)
================================================================================

SELECT *
FROM InvMarc
WHERE
TEXTSEARCH('DISNEYLAND' IN Keywords)

Tables Involved
---------------

InvMarc (InvMarc) table opened shared, has 9333 rows

Result Set Generation
---------------------

Result set will be canned

Result set will consist of one or more rows

WHERE Clause Execution
----------------------

NOTE: Optimized TEXTSEARCH() conditions do not use statistics and will show
zero (0) for any I/O costs

The expression:

TEXTSEARCH('DISNEYLAND',Keywords) = TRUE

is OPTIMIZED, covers 0 rows or index keys, costs 0 bytes, and will be applied
to the InvMarc table (InvMarc) before any joins

================================================================================
>>>>> 1 rows affected in 0 seconds
================================================================================

As you can see, the OR'd query and the first indiviual query without the OR are both partially optimized. You can also see that it's extremely fast if
run as two queries (though not an option for my needs) but takes a whopping 17.488 seconds to run the combined query. I'm not cretain,but it
looks like this is tripping up DBISAM's query optimizer and it's trying to do a brute-force search for the LIKE statement before evaluating the
optimized TEXTSEARCHes.

I'll keep your #160 non-breaking space tip in mind but it won't be helpful in this particular case for several reasons. The Contents column contains
an industry-specific record format that we have to maintain (see www.loc.gov/marc if you're curious) and the Keywords column contains only
specific words which do not repeat within a record and are reduced to stems and sometimes have coded prefixes for specific types of searches
(that column isn't useful for phrase searching).

I have uploaded the table (it's 12.9MB compressed) to http://www.SurpassSupport.com/files/InvMarc.zip.

Thanks,
Danny

Thanks,
Danny
Sat, Aug 9 2008 3:14 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Danny


I'll try to have a look at your tables tomorrow (no promises). I know you said its generated from user input so I don't know if you could do something like:

1. select table id into memory table (t1) for first subclause
2. index t1
3. select table id into t1 where second subclause and id not already in t1
4. select from main table where id in t1
5 drop t1

Roy Lambert [Team Elevate]
Sun, Aug 10 2008 9:47 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Danny


I'm not sure what's going on. A brute force scan with just UPPER(Contents) LIKE '%DISNEY WORLD%' in the WHERE clause is faster (c4 secs) than the clause you have (c20 secs)

If you can manage to generate code of the type below its sub second (<.2 secs)



SELECT Number INTO "Memory\t1" FROM InvMarc
WHERE
(TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%'));

CREATE INDEX Speedup ON "Memory\t1" (Number);

INSERT INTO "Memory\t1" SELECT Number FROM InvMarc
WHERE TEXTSEARCH('DISNEYLAND' IN Keywords)
AND NOT Number IN (SELECT Number FROM "Memory\t1");

SELECT * FROM InvMarc WHERE Number IN (SELECT Number FROM "Memory\t1");


Roy Lambert [Team Elevate]
Sun, Aug 10 2008 10:41 AMPermanent Link

"Ralf Bieber"
Danny Humphress wrote:

Hi Denny

> Hi,
>
> I have a table that contains two memo fields. One has a full text
> index and the other does not. The following query should be fast but
> it is quite slow:
>
> SELECT *
> FROM InvMarc
> WHERE
> (
>  (TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE
> '%DISNEY WORLD%')) OR
> TEXTSEARCH('DISNEYLAND' IN Keywords)
>  )
>


You can use this:

SELECT *
FROM InvMarc
WHERE
(TEXTSEARCH('DISNEYLAND' IN Keywords)
)
except all
SELECT *
FROM InvMarc
WHERE
(
(TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE
'%DISNEY WORLD%'))
)

Ralf

--
Sun, Aug 10 2008 1:08 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Ralf


Excellent - slightly faster than my solution and no memory table.

Roy Lambert [Team Elevate]
Sun, Aug 10 2008 3:44 PMPermanent Link

Danny Humphress
Roy & Ralf,

Actually, that's not a solution -- most importantly because it does not give the same results as OR'ing the subqueries. A UNION would be very fast
but I wouldn't be able to use it because of the MEMO fields and because of the complexity of parsing a complex user-entered search into a series
of UNION'd SQL queries.

These queries are built on-the-fly based on simple Google-like queries that users will enter. In this case, the user's query might have been to
search for:

"DISNEY WORLD" OR DISNEYLAND

The logic in my application turns that into a SQL query based on knowledge of our database structure. The queries can be much more complex
such as

title:"HARRY POTTER" OR subject:wizards

The application produces SQL for these that in all cases should be at least partially optimized for DBISAM and return results quite quickly. My
problem is that there is a problem with DBISAM's query optimizer when it encoounters an OR on two or more subqueries -- one fully optimized
and the other partially optimized.

Thanks,
Danny
Mon, Aug 11 2008 3:49 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Danny


In that case what about the memory tables idea. A bit more complex to program the sql but definitely doable, and should produce the same results.

Roy Lambert [Team Elevate]
Mon, Aug 11 2008 8:12 AMPermanent Link

Danny Humphress
Roy,

But doesn't it make more sense to find out what's wrong with DBISAM's query optimizer and fix it rather than design an elaborate workaround?

These queries could get pretty complicated with many levels of ORs and ANDs.

Thanks,
Danny
Page 1 of 3Next Page »
Jump to Page:  1 2 3
Image