Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 22 total
Thread CONTAINS (Full Text Index) vs LIKE ...
Wed, Aug 12 2009 7:39 PMPermanent Link

Charalampos Michael
Hello,
  What's your opinion for VARCHAR/CHAR fields ?

Thank you

--
Charalampos Michael - [Creation Power] - http://www.creationpower.gr
Thu, Aug 13 2009 2:57 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Charalampos



My take is even with the fact that ElevateDB no longer right trims varchars on storing since it does ignore them on comparison I'd go for them UNLESS I have a fixed length fixed format field that required all characters filling. With VARCHARS you could even find that an = works if the LIKE is just to accommodate blank spaces.

CONTAINS vs LIKE. Well CONTAINS is a lot faster, but requires more disk space, more insert, update & delete time and can't handle phrases. Also a pet peeve unlike TEXTSEARCH it ONLY works on indexed fields. I'm using full text indexing a lot. Often for things that could go into a x-ref table but its easier to store everything in the one table and it allows filters to work. For columns with a full text index finding a phrase I generally use a mix of CONTAINS and LIKE.

If you do go down the CONTAINS route then unless you write your own text filter and word generator you could find the default one has filtered out some of the words in your field. The same words will be filtered out of any search text but it can seem to give wrong results. But the big difference is word vs phrase searching.

Roy Lambert
Thu, Aug 13 2009 6:49 AMPermanent Link

Charalampos Michael
Dear Roy,

> My take is even with the fact that ElevateDB no longer right trims varchars on storing
> since it does ignore them on comparison I'd go for them UNLESS I have
a fixed length fixed
> format field that required all characters filling. With VARCHARS you
could even find that
> an = works if the LIKE is just to accommodate blank spaces.
>
> CONTAINS vs LIKE. Well CONTAINS is a lot faster, but requires more disk space, more insert,
> update & delete time and can't handle phrases. Also a pet peeve unlike TEXTSEARCH it
> ONLY works on indexed fields. I'm using full text indexing a lot. Often for things
> that could go into a x-ref table but its easier to store everything in the one table and
> it allows filters to work. For columns with a full text index finding a phrase I
> generally use a mix of CONTAINS and LIKE.
>
> If you do go down the CONTAINS route then unless you write your own text filter and word generator
> you could find the default one has filtered out some of the words in
your field. The same words
> will be filtered out of any search text but it can seem to give wrong
results. But the big
> difference is word vs phrase searching.

Very interesting stuff! Thank you Roy!

Actually i want to do a "Keyword" searching and maybe use it for
filtering. I don't care about disk space, i care about speed and
as you said CONTAINS is a lot of faster than LIKE.

Since I'm intent to use them on fixed length VARCHAR fields what
bothers me is that I'm afraid that CONTAINS may miss keywords
where's LIKE will not ... or I'm wrong ?

What about the mix of CONTAINS & LIKE ? Is it faster and safer ?

Thank you very much

--
Charalampos Michael - [Creation Power] - http://www.creationpower.gr
Thu, Aug 13 2009 10:09 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Charalampos

>Actually i want to do a "Keyword" searching and maybe use it for
>filtering. I don't care about disk space, i care about speed and
>as you said CONTAINS is a lot of faster than LIKE.
>
>Since I'm intent to use them on fixed length VARCHAR fields

What is a fixed length VARCHAR field when its at home?

>what
>bothers me is that I'm afraid that CONTAINS may miss keywords
>where's LIKE will not ... or I'm wrong ?

Not unless they have been filtered out (eg its a stop word or is longer than the max length allowed) and that's down to you setting up the index and the text filter and word generator (hint have a look in the extensions ng for mine)

>What about the mix of CONTAINS & LIKE ? Is it faster and safer ?

Lets give you some examples. My money making work was recruitment (hopefully it will be again in the near future when the economy picks up) so my examples are slanted towards that.

I'm going to pick 2 fields from the Contacts table:

Notes: memo free form text
Skillsets: memo managed text


NOTES is just what it says a note field into which the user can type anything they like eg Been on holiday, likes skiing wife's name is Vicky has 3 children, and he's very good with Microsoft Word

SKILLSETS is a list of the skills the contact has eg Delphi, Microsoft Word, C++. These are managed because 1) the field is populated via lookup from a table and 2) the spaces are replaced with a hard space (#160). The reason for the substitution is that Microsoft#160Word is treated as one word whilst Microsoft#32Word would be two words BUT both look the same when printed!

So a key difference is the way I have the two set up. Skillsets CONTAINS 'Microsoft#160Word' would only return hits with that "entire word" whereas Notes CONTAINS 'Microsoft Word' would return hits for any row with the words Microsoft and Word anywhere in the text so to get just the MSWord users you'd have to do

Notes CONTAINS 'Microsoft Word' AND Notes LIKE '%Microsoft Word%'

This will be faster than just Notes LIKE '%Microsoft Word%', but the speed difference will depend on the number of hits that CONTAINS returns to be further processed by LIKE

If you want any more can you give me a few examples of what you're looking to do.

Roy Lambert [Team Elevate]
Thu, Aug 13 2009 11:21 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Michael,

<< What about the mix of CONTAINS & LIKE ? Is it faster and safer ? >>

Using a mix is the best option, and will allow you to have the proper
combination of performance and accuracy.

--
Tim Young
Elevate Software
www.elevatesoft.com

Fri, Aug 14 2009 5:42 AMPermanent Link

Charalampos Michael
Dear Roy,

>> Since I'm intent to use them on fixed length VARCHAR fields
>
> What is a fixed length VARCHAR field when its at home?

"Name" VARCHAR(50) COLLATE "UNI",

>> what
>> bothers me is that I'm afraid that CONTAINS may miss keywords
>> where's LIKE will not ... or I'm wrong ?
>
> Not unless they have been filtered out (eg its a stop word or is longer than the max length allowed) and that's down to you setting up the index and the text filter and word generator (hint have a look in the extensions ng for mine)

I don't have any filters ....

>> What about the mix of CONTAINS & LIKE ? Is it faster and safer ?
>
> Lets give you some examples. My money making work was recruitment (hopefully
> it will be again in the near future when the economy picks up) so my examples
> are slanted towards that.
>
> I'm going to pick 2 fields from the Contacts table:
>
> Notes: memo free form text
> Skillsets: memo managed text

You're talking for CLOB fields while i'm talking for plain VARCHAR
fields Smile

Although thanks for the information about the filtering because i'll
need it!

--
Charalampos Michael - [Creation Power] - http://www.creationpower.gr
Fri, Aug 14 2009 5:43 AMPermanent Link

Charalampos Michael
Dear Tim,

> << What about the mix of CONTAINS & LIKE ? Is it faster and safer ? >>
>
> Using a mix is the best option, and will allow you to have the proper
> combination of performance and accuracy.

Thanks, so i must have 2 index fields for doing this fast right ?
(One Index and One Text Index for the same field)

--
Charalampos Michael - [Creation Power] - http://www.creationpower.gr
Fri, Aug 14 2009 7:07 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Charalampos

>> What is a fixed length VARCHAR field when its at home?
>
>"Name" VARCHAR(50) COLLATE "UNI",

Ahhh you were talking about normal ElevateDB/DBISAM VARCHARs vs PICK like variable length fields?

>I don't have any filters ....

The text filter is the little dll that you write, or the standard one that Tim provides (which, if it exists, I think just passes the text unchanged) that takes the raw text string and strips out stuff that you don't want like numbers or html or rtf formatting


>You're talking for CLOB fields while i'm talking for plain VARCHAR
>fields Smile

Not a lot of difference really apart from being able to set a normal index on a VARCHAR which you can't do on a CLOB

Roy Lambert
Fri, Aug 14 2009 8:16 PMPermanent Link

Charalampos Michael
Dear Roy,

>>> What is a fixed length VARCHAR field when its at home?
>> "Name" VARCHAR(50) COLLATE "UNI",
>
> Ahhh you were talking about normal ElevateDB/DBISAM VARCHARs vs PICK like variable length fields?

?

Simple and clean "String" fields ...

Rec1: Name = 'Charalampos'
Rec2: Name = 'Nick'
Rec3: Name = 'Peter'
etc

>> I don't have any filters ....
>
> The text filter is the little dll that you write, or the standard one that Tim
> provides (which, if it exists, I think just passes the text unchanged)
> that takes the raw text string and strips out stuff that you don't
want like numbers
> or html or rtf formatting

That will gonna help me on CLOB fields which i put the RTF/HTML

>> You're talking for CLOB fields while i'm talking for plain VARCHAR
>> fields Smile
>
> Not a lot of difference really apart from being able to set a normal index
> on a VARCHAR which you can't do on a CLOB

You're forgetting that VARCHAR has different file format (limited/fixed)
than CLOB.

--
Charalampos Michael - [Creation Power] - http://www.creationpower.gr
Sat, Aug 15 2009 2:52 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Charalampos

>>
>> Not a lot of difference really apart from being able to set a normal index
> > on a VARCHAR which you can't do on a CLOB
>
>You're forgetting that VARCHAR has different file format (limited/fixed)
>than CLOB.

But that makes no difference to full text indexing (apart from the number of words in the field)

Roy Lambert
Page 1 of 3Next Page »
Jump to Page:  1 2 3
Image