Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 5 of 5 total
Thread The character "_" should not be treated as word breaker in the default word generator
Tue, May 24 2011 11:48 AMPermanent Link

Xiannong Chen

The character "_" should not be treated as word breaker in the default word generator. It is part of a word.
Tue, May 24 2011 12:51 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Xiannong,

<< The character "_" should not be treated as word breaker in the default
word generator. It is part of a word. >>

While I agree with you, I'm not in a position to change the default text
indexing without breaking the indexes of existing installations.  Your other
option is to use your own word generator.   You can easily create one using
the ElevateDB Word Generator Module template in the Object Repository in the
Delphi IDE.  The template comes pre-configured to exactly reproduce the same
results as the default word generator, so you only need to tweak it by
following the comments in the code.

--
Tim Young
Elevate Software
www.elevatesoft.com

Tue, May 24 2011 1:13 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Xiannong / Tim


I disagree, at least as far as the English language is concerned (not sure about American).

Roy Lambert
Tue, May 24 2011 1:16 PMPermanent Link

Xiannong Chen

Tim, I migrated my application from DBISAM. In DBISAM, "_" is not treated as a word breaker. I think you should change the default setting. It may not cause a problem to existing installations because users do not expect "_" to be a word breaker. If you do not change it, more installations will be distributed with the wrong word breaker. I bought ElevateDB CS with source. Could you tell me where can I change this myself? Thanks, Xiannong
Tue, May 31 2011 3:39 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Xiannong,

<< Tim, I migrated my application from DBISAM. In DBISAM, "_" is not treated
as a word breaker. I think you should change the default setting. It may not
cause a problem to existing installations because users do not expect "_" to
be a word breaker. If you do not change it, more installations will be
distributed with the wrong word breaker. >>

You're not understanding me:  it will *break their text indexes* if I change
it.  In other words, it will require a REPAIR TABLE run on every table with
text indexes if I change it, otherwise searching/updating will not work
correctly.

<< I bought ElevateDB CS with source. Could you tell me where can I change
this myself? >>

The relevant code is in edbstring.pas:

function GetNextTextIndexWord(Collation: Integer;
                             const Value: TEDBString;
                             var Position: Integer;
                             AllowWildCard: Boolean=False): TEDBString;
var
  TempWord: TEDBString;
  TempWildcards: Boolean;
begin
  Result:='';
  TempWord:='';
  TempWildcards:=False;
  while (Result='') and (Position <= Length(Value)) do
     begin
     while (Result='') and (Position <= Length(Value)) do
        begin
        if SpaceChars.CharInSet(Value[Position]) and
           (not (AllowWildcard and (Value[Position]=WILDCARD))) then
           begin
           Inc(Position);
           Break;
           end
        else if IncludeChars.CharInSet(Value[Position]) then
           TempWord:=TempWord+Value[Position]
        else if (AllowWildcard and (Value[Position]=WILDCARD)) then
           begin
           TempWord:=TempWord+Value[Position];
           TempWildcards:=True;
           end;
        Inc(Position);
        end;
     if AllowWildcard and TempWildcards then
        Result:=TempWord
     else if (not AllowWildcard) or
             (AllowWildcard and (not TempWildcards)) then
        begin
        if (Length(TempWord) >= MIN_WORD_SIZE) and
           (Length(TempWord) <= MAX_WORD_SIZE) and
           (not IsStopWord(Collation,TempWord)) then
           Result:=TempWord
        else
           TempWord:='';
        end
     else
        TempWord:='';
     end;
end;

However, all you need to change is the initialization of the SpaceChars
object in the initialization section:

  SpaceChars:=TEDBCharSet.Create;
  with SpaceChars do
     begin
     AddCharRange(TEDBChar(0),TEDBChar(47));
     AddCharRange(TEDBChar(58),TEDBChar(64));
     AddCharRange(TEDBChar(91),TEDBChar(96));
     AddCharRange(TEDBChar(123),TEDBChar(130));
     AddCharRange(TEDBChar(132),TEDBChar(137));
     AddChar(TEDBChar(139));
     AddChar(TEDBChar(141));
     AddCharRange(TEDBChar(143),TEDBChar(153));
     AddChar(TEDBChar(155));
     AddChar(TEDBChar(157));
     AddCharRange(TEDBChar(160),TEDBChar(191));
     AddChar(TEDBChar(215));
     AddChar(TEDBChar(247));
     end;

Just add an:

AddChar(TEDBChar('_'));

at the bottom.

--
Tim Young
Elevate Software
www.elevatesoft.com


Image