Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 14 total
Thread A small slew of questions about full text indexing & CONTAINS
Wed, Nov 28 2007 3:41 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

I want to make sure I have my head round the various issues before I start coding so ....

1. When creating a text index I can leave the Filter Type Column blank. Are there any nasty consequences to this?
2. What gets passed into a user written text filter when the Filter Type Column is left blank. I know I could write some code and test but I'd like the "definitive" answer.
3. This is the biggie - if I create a custom textfilter (lets say it removes all words starting with "e") and then run a query on the lines of

Field CONTAINS 'elevatesoft produces brilliant software'

Does it
a) just leave it in and hence the query is GUARANTEED to return false since elevatesoft isn't in the index?
b) pass it through my textfilter to preprocess and remove unwanted words. If so is this on a row by row basis or a once for the query basis and if the latter what about Filter Type Column influences?
c) something else entirely - if so what?

4. Is there any point in me converting the output string from my custom text filter to lower case - are there any consequences for doing so?

There was another one but I've forgotten it.

Roy Lambert
Wed, Nov 28 2007 7:38 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< 1. When creating a text index I can leave the Filter Type Column blank.
Are there any nasty consequences to this? >>

No.

<< 2. What gets passed into a user written text filter when the Filter Type
Column is left blank. I know I could write some code and test but I'd like
the "definitive" answer. >>

Nothing.  Not specifying a text filter column for a text index means that
you don't wish to perform any filtering based upon its type.

3. This is the biggie - if I create a custom textfilter (lets say it
removes all words starting with "e") and then run a query on the lines of

Field CONTAINS 'elevatesoft produces brilliant software'

Does it
a) just leave it in and hence the query is GUARANTEED to return false since
elevatesoft isn't in the index? >>

Correct.

<< b) pass it through my textfilter to preprocess and remove unwanted words.
If so is this on a row by row basis or a once for the query basis and if the
latter what about Filter Type Column influences? >>

No, text filters are entirely a function of the text stored in a
CHAR/VARCHAR/CLOB column only, and are simply used to remove any formatting
information so as to leave the actual text contents that need to be indexed.

<< 4. Is there any point in me converting the output string from my custom
text filter to lower case - are there any consequences for doing so? >>

I wouldn't recommend it.  Just leave the case of the text and the default
text indexing behavior alone and you will get case-insensitive behavior
anyways.

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, Nov 28 2007 8:14 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


My arn't you up early Smiley

><< 1. When creating a text index I can leave the Filter Type Column blank.
>Are there any nasty consequences to this? >>
>
>No.
>
><< 2. What gets passed into a user written text filter when the Filter Type
>Column is left blank. I know I could write some code and test but I'd like
>the "definitive" answer. >>
>
>Nothing. Not specifying a text filter column for a text index means that
>you don't wish to perform any filtering based upon its type.

This confuses me. I think I understand but just to double check

procedure TEDBTextFilterModule1.EDBTextFilterModuleFilterText(const FilterType: String; const TextToFilter: String; var FilteredText: String);
FilterType would be set to a blank string (ie not a null Smiley, TextToFilter would be the text from the field?

The horrible alternative interpretation was no filtering was performed but I'm assuming the No to question 1 eliminates that.

> Does it
>a) just leave it in and hence the query is GUARANTEED to return false since
>elevatesoft isn't in the index? >>

OK I now know what I need to do.

Thu, Nov 29 2007 4:21 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


Just done a bit of playing and some RTFM and I'm getting a bit worried.

There are several tables in the app which will get full text indexing applied

ELN & EMails may have plain or formatted text in one CLOB that will be indexed so for that column a column specifying filter type is a good idea. Other columns will always be plain text

Companies, Contacts, Calls will have CLOB columns that are index but all of them will always be plain text.

In the case of the plain text columns I still want to apply my own filtering but its looking like I need a filter type column for them

From the OLH in EDBMan

Whenever the Notes column is updated, the appropriate text filter will be called with the new contents of the Notes column, and the filtered text that is returned will be passed on to the word generation process.

So my question is how do I create a textfilter that is called without having to include an unwanted and essentially useless column in each table?

What would be brilliant would be something that passed the field (and preferably table) name rather than filter type.

Roy Lambert
Thu, Nov 29 2007 8:51 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


My testing shows that if the Filter Type column contents do not match the textfilter module type the module isn't called. This implies I'll need a separate module for each filtering type (to use your examples rtf and html) or just have a generic Filter Type column which just has something like 'filter' as its contents and then check inside the textfilter as to what I want to do. So what's the point of passing the FilterType into the module?


Roy Lambert
Thu, Nov 29 2007 4:33 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< So my question is how do I create a textfilter that is called without
having to include an unwanted and essentially useless column in each table?
>>

You can't - the filter type column determines which text filter is called,
if any.

<< What would be brilliant would be something that passed the field (and
preferably table) name rather than filter type. >>

What's the difference ?  It sounds to me like just another way of
introducing a filter type column.

--
Tim Young
Elevate Software
www.elevatesoft.com

Thu, Nov 29 2007 4:39 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< My testing shows that if the Filter Type column contents do not match the
textfilter module type the module isn't called. >>

So does the manual: Smiley

"If a matching text filter is found, then the text to be indexed is first
passed to the text filter before being passed on to the word generator (see
below). If a matching text filter is not found, then the text is passed on
directly to the word generator without being filtered."

<< This implies I'll need a separate module for each filtering type (to use
your examples rtf and html) or just have a generic Filter Type column which
just has something like 'filter' as its contents and then check inside the
textfilter as to what I want to do. So what's the point of passing the
FilterType into the module? >>

It's just basically a check to make sure that you've got the hook-ups done
correctly in the CREATE TEXT FILTER DDL statement.  However, you could also
use the same DLL library for more than one filter type by simply copying
each DLL to a different name, but keeping the exact same code, and have the
DLL do the decision making.

--
Tim Young
Elevate Software
www.elevatesoft.com

Fri, Nov 30 2007 5:36 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


Merged two posts together cos they're both relevant. I'm also "thinking aloud" and trying to get my head round this as well so please bear with me.

><< So my question is how do I create a textfilter that is called without
>having to include an unwanted and essentially useless column in each table?
> >>
>
>You can't - the filter type column determines which text filter is called,
>if any.
>
><< What would be brilliant would be something that passed the field (and
>preferably table) name rather than filter type. >>
>
>What's the difference ? It sounds to me like just another way of
>introducing a filter type column.

Yes and no. Your current schema operates at a row level, this would operate at a column level. I can see the benefits of a row level determination of filter type when the data to be operated on has a changeable format. This is the case with ELN which will contain Emails (not changeable), Letters (not changeable) and Notes (changeable) but not with eMails (I have two tables one for emails from people/institutions that are not part of the contact base and the other for those that are which may contains letters and notes as well as emails). For ELN it makes sense to work at row level since the notes may be plain or rich text. eMails, since I'm only going to need to perform the filtering once (the field will never be changed) I could easily determine if its plain, html or (it happens) rtf within the text filter itself.

Each table also has two other columns which will have a text index - Comments & Flags. Neither of these will need filtering, probably Comments will pass through a word generator as will Flags but the criteria will be different. Both are changeable.

I'll need two word generators because Comments will simply enforce the users views on word lengths and stop lists. Flags will have to pass through one because I use #160 (hard space) to make words into a phrase that can then be searched for eg "interesting piece of code"

Thinks "this will probably be a better scenario than me doing all the work in a text filter AND then it passing through Tim's word generator".

I have another table where the data is guaranteed to be in wpt format (WPTools 5 propriety format) and another one where its guaranteed to be html. In those two cases it would be better to be able to call a text filter based on the column name rather than a row in a filter type column.

Finally I've tables (Companies, Contacts, Calls, Career) all of which have columns that will potentially be indexed but will be plain text and so will only pass through a word generator. (ps the fact that they all start with C is co-incidence Smiley.

So to summarise my views now I know what they are I think we need

1) a text filter controlled by the contents of a text filter type column
2) a text filter that is not controlled but passes the column name in
3) word generator and I can live without the column name by creating a few of them at the cost of a bit of code duplication (or adding units in)

><< My testing shows that if the Filter Type column contents do not match the
>textfilter module type the module isn't called. >>
>
>So does the manual: Smiley
>
>"If a matching text filter is found, then the text to be indexed is first
>passed to the text filter before being passed on to the word generator (see
>below). If a matching text filter is not found, then the text is passed on
>directly to the word generator without being filtered."

I wasn't sure but now I am. Before I assumed the matching was the text filter name now I know its the type (I guess as well but I'm not sure)

><< This implies I'll need a separate module for each filtering type (to use
>your examples rtf and html) or just have a generic Filter Type column which
>just has something like 'filter' as its contents and then check inside the
>textfilter as to what I want to do. So what's the point of passing the
>FilterType into the module? >>
>
>It's just basically a check to make sure that you've got the hook-ups done
>correctly in the CREATE TEXT FILTER DDL statement. However, you could also
>use the same DLL library for more than one filter type by simply copying
>each DLL to a different name, but keeping the exact same code, and have the
>DLL do the decision making.
>

I have twigged to what you mean and I have only one thing to say - YEUCH

Roy Lambert

Fri, Nov 30 2007 3:10 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< Yes and no. Your current schema operates at a row level, this would
operate at a column level. I can see the benefits of a row level
determination of filter type when the data to be operated on has a
changeable format. This is the case with ELN which will contain Emails (not
changeable), Letters (not changeable) and Notes (changeable) but not with
eMails (I have two tables one for emails from people/institutions that are
not part of the contact base and the other for those that are which may
contains letters and notes as well as emails). For ELN it makes sense to
work at row level since the notes may be plain or rich text. eMails, since
I'm only going to need to perform the filtering once (the field will never
be changed) I could easily determine if its plain, html or (it happens) rtf
within the text filter itself. >>

In that case, just define a text filter for 'Email' and set up a text filter
column that contains the value 'Email' for all rows.


<< I wasn't sure but now I am. Before I assumed the matching was the text
filter name now I know its the type (I guess as well but I'm not sure) >>

Yep, it's the filter type column's value matched up with the text filter
type.

<< I have twigged to what you mean and I have only one thing to say - YEUCH
>>

Then don't use it. Smiley

--
Tim Young
Elevate Software
www.elevatesoft.com

Sat, Dec 1 2007 4:44 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


>In that case, just define a text filter for 'Email' and set up a text filter
>column that contains the value 'Email' for all rows.

That's what I was hoping to avoid.

>Then don't use it. Smiley

Guess what.....

Roy Lambert
Page 1 of 2Next Page »
Jump to Page:  1 2
Image