Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » DBISAM Technical Support » Support Forums » DBISAM SQL » View Thread |
Messages 1 to 10 of 23 total |
Query slow |
Fri, Aug 8 2008 3:35 PM | Permanent Link |
Danny Humphress | Hi,
I have a table that contains two memo fields. One has a full text index and the other does not. The following query should be fast but it is quite slow: SELECT * FROM InvMarc WHERE ( (TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%')) OR TEXTSEARCH('DISNEYLAND' IN Keywords) ) I know that the LIKE operator won't be optimized but that part of the query should be partially optimized and, in fact, it does run quickly when I just use the following WHERE clause by itself: (TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%')) The query is slow only when the two are OR'd together. I could break this into two queries and UNION them, but this SQL is built on-the-fly based on a user query so there could be all kinds of combinations. In case you're wondering, the reason for doing a TEXTSEARCH and a LIKE search for the same term in two different fields is that I am looking for an exact phrase "DISNEY WORLD" and using the TEXTSEARCH to narrow down the list of candidates. Keywords and Contents are two different fields because Contents has a industry-specific record formatting and Keywords contains only specific words from the Contents field. Thanks, Danny |
Sat, Aug 9 2008 9:36 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Danny
I just tried running an OR'd TEXTSEARCH here (no LIKE) and the timing was not quite double. Can you run the query in DBSys and post the execution plan? I don't know wether it would be usable in your case but I had a similar requirement to look for phrases rather than words and what I did was to replace #32 with #160 (space with "hard space"). They are visually identical which for me was important but since #160 isn't a separator character the phrase (in my case product names) are treated as one word. It works a treat unless you want to search for DISNEY and WORLD separately. The other approach I've seen suggested is to have DISNEY, WORLD and DISNEYWORLD all in the index. Roy Lambert [Team Elevate] |
Sat, Aug 9 2008 2:08 PM | Permanent Link |
Danny Humphress | Roy,
Here is the plan for the problem search: ================================================================================ SQL statement (Executed with 4.26 Build 3) ================================================================================ SELECT * FROM InvMarc WHERE ( (TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%')) OR TEXTSEARCH('DISNEYLAND' IN Keywords) ) Tables Involved --------------- InvMarc (InvMarc) table opened shared, has 9333 rows Result Set Generation --------------------- Result set will be canned Result set will consist of one or more rows WHERE Clause Execution ---------------------- NOTE: Optimized TEXTSEARCH() conditions do not use statistics and will show zero (0) for any I/O costs The expression: TEXTSEARCH('DISNEY WORLD',Keywords) = TRUE AND UPPER(Contents) LIKE '%DISNEY WORLD%' OR TEXTSEARCH('DISNEYLAND',Keywords) = TRUE is PARTIALLY-OPTIMIZED, covers 0 rows or index keys, costs 0 bytes, and will be applied to the InvMarc table (InvMarc) before any joins ================================================================================ >>>>> 1 rows affected in 17.488 seconds ================================================================================ ...and here is the plan for the search without an OR clause: ================================================================================ SQL statement (Executed with 4.26 Build 3) ================================================================================ SELECT * FROM InvMarc WHERE ( (TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%')) ) Tables Involved --------------- InvMarc (InvMarc) table opened shared, has 9333 rows Result Set Generation --------------------- Result set will be canned Result set will consist of one or more rows WHERE Clause Execution ---------------------- NOTE: Optimized TEXTSEARCH() conditions do not use statistics and will show zero (0) for any I/O costs The expression: TEXTSEARCH('DISNEY WORLD',Keywords) = TRUE AND UPPER(Contents) LIKE '%DISNEY WORLD%' is PARTIALLY-OPTIMIZED, covers 0 rows or index keys, costs 0 bytes, and will be applied to the InvMarc table (InvMarc) before any joins ================================================================================ >>>>> 0 rows affected in 0.031 seconds ================================================================================ ...and the other one without OR: ================================================================================ SQL statement (Executed with 4.26 Build 3) ================================================================================ SELECT * FROM InvMarc WHERE TEXTSEARCH('DISNEYLAND' IN Keywords) Tables Involved --------------- InvMarc (InvMarc) table opened shared, has 9333 rows Result Set Generation --------------------- Result set will be canned Result set will consist of one or more rows WHERE Clause Execution ---------------------- NOTE: Optimized TEXTSEARCH() conditions do not use statistics and will show zero (0) for any I/O costs The expression: TEXTSEARCH('DISNEYLAND',Keywords) = TRUE is OPTIMIZED, covers 0 rows or index keys, costs 0 bytes, and will be applied to the InvMarc table (InvMarc) before any joins ================================================================================ >>>>> 1 rows affected in 0 seconds ================================================================================ As you can see, the OR'd query and the first indiviual query without the OR are both partially optimized. You can also see that it's extremely fast if run as two queries (though not an option for my needs) but takes a whopping 17.488 seconds to run the combined query. I'm not cretain,but it looks like this is tripping up DBISAM's query optimizer and it's trying to do a brute-force search for the LIKE statement before evaluating the optimized TEXTSEARCHes. I'll keep your #160 non-breaking space tip in mind but it won't be helpful in this particular case for several reasons. The Contents column contains an industry-specific record format that we have to maintain (see www.loc.gov/marc if you're curious) and the Keywords column contains only specific words which do not repeat within a record and are reduced to stems and sometimes have coded prefixes for specific types of searches (that column isn't useful for phrase searching). I have uploaded the table (it's 12.9MB compressed) to http://www.SurpassSupport.com/files/InvMarc.zip. Thanks, Danny Thanks, Danny |
Sat, Aug 9 2008 3:14 PM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Danny
I'll try to have a look at your tables tomorrow (no promises). I know you said its generated from user input so I don't know if you could do something like: 1. select table id into memory table (t1) for first subclause 2. index t1 3. select table id into t1 where second subclause and id not already in t1 4. select from main table where id in t1 5 drop t1 Roy Lambert [Team Elevate] |
Sun, Aug 10 2008 9:47 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Danny
I'm not sure what's going on. A brute force scan with just UPPER(Contents) LIKE '%DISNEY WORLD%' in the WHERE clause is faster (c4 secs) than the clause you have (c20 secs) If you can manage to generate code of the type below its sub second (<.2 secs) SELECT Number INTO "Memory\t1" FROM InvMarc WHERE (TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%')); CREATE INDEX Speedup ON "Memory\t1" (Number); INSERT INTO "Memory\t1" SELECT Number FROM InvMarc WHERE TEXTSEARCH('DISNEYLAND' IN Keywords) AND NOT Number IN (SELECT Number FROM "Memory\t1"); SELECT * FROM InvMarc WHERE Number IN (SELECT Number FROM "Memory\t1"); Roy Lambert [Team Elevate] |
Sun, Aug 10 2008 10:41 AM | Permanent Link |
"Ralf Bieber" | Danny Humphress wrote:
Hi Denny > Hi, > > I have a table that contains two memo fields. One has a full text > index and the other does not. The following query should be fast but > it is quite slow: > > SELECT * > FROM InvMarc > WHERE > ( > (TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE > '%DISNEY WORLD%')) OR > TEXTSEARCH('DISNEYLAND' IN Keywords) > ) > You can use this: SELECT * FROM InvMarc WHERE (TEXTSEARCH('DISNEYLAND' IN Keywords) ) except all SELECT * FROM InvMarc WHERE ( (TEXTSEARCH('DISNEY WORLD' IN Keywords) AND (UPPER(Contents) LIKE '%DISNEY WORLD%')) ) Ralf -- |
Sun, Aug 10 2008 1:08 PM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Ralf
Excellent - slightly faster than my solution and no memory table. Roy Lambert [Team Elevate] |
Sun, Aug 10 2008 3:44 PM | Permanent Link |
Danny Humphress | Roy & Ralf,
Actually, that's not a solution -- most importantly because it does not give the same results as OR'ing the subqueries. A UNION would be very fast but I wouldn't be able to use it because of the MEMO fields and because of the complexity of parsing a complex user-entered search into a series of UNION'd SQL queries. These queries are built on-the-fly based on simple Google-like queries that users will enter. In this case, the user's query might have been to search for: "DISNEY WORLD" OR DISNEYLAND The logic in my application turns that into a SQL query based on knowledge of our database structure. The queries can be much more complex such as title:"HARRY POTTER" OR subject:wizards The application produces SQL for these that in all cases should be at least partially optimized for DBISAM and return results quite quickly. My problem is that there is a problem with DBISAM's query optimizer when it encoounters an OR on two or more subqueries -- one fully optimized and the other partially optimized. Thanks, Danny |
Mon, Aug 11 2008 3:49 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Danny
In that case what about the memory tables idea. A bit more complex to program the sql but definitely doable, and should produce the same results. Roy Lambert [Team Elevate] |
Mon, Aug 11 2008 8:12 AM | Permanent Link |
Danny Humphress | Roy,
But doesn't it make more sense to find out what's wrong with DBISAM's query optimizer and fix it rather than design an elaborate workaround? These queries could get pretty complicated with many levels of ORs and ANDs. Thanks, Danny |
Page 1 of 3 | Next Page » | |
Jump to Page: 1 2 3 |
This web page was last updated on Saturday, April 27, 2024 at 08:52 PM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |