Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 3 of 3 total
Thread Full Text Indexing performance
Mon, Dec 31 2012 6:54 PMPermanent Link

Jim Gallagher

We have a Delphi7 program in production that processes a text file containing an ID and a Name field.  It contains about 10 million records, and the process ran about 20-30 minutes.  It builds a full text index on the name field.  

We lost the source code to this program (bad disk, bad backups), which I originally wrote, and the replacement  code, which I did not write, was running 3-4 days(!).  I tried to reverse engineer the old program (postponing the index builds until after the text file was processed).  This reduced the time to about 13 hours.  Still too long.

The old version was written in 2007, with whatever version of DBISAM was current then.  My re-write (also in Delphi 7) is with 4.34 Build 5.

My code is quite simple.  I create the table without the indexes.  The program processes the 10 million records in 2+ minutes.

The build of the full text index is taking over 10 hours.  This is the (throwaway) create table code:

 with dbisamtable1 do
 try
   if Active then Close;
   TableName:= 'FV00020621';
   Exclusive:= True;
   try
     DeleteTable;
   except
   end;
  try
     with dbisamtable1 do
        begin
        DatabaseName:= eDB.Text;
        TableName:='fv00020621';
        Exclusive:=True;
        if (not Exists) then
           begin
           with FieldDefs do
              begin
              Clear;
              Add('KEY',ftString,30);
              Add('TITLE',ftString,40);
              end;
           with IndexDefs do
              begin
              Clear;
              Add('','KEY',[ixPrimary,ixUnique],'',icFull);
              end;
           CreateTable(0,1,0,False,'','',4096,512,0)
           end;
        end;
  finally
  end;

 except
 end;

After the text file is processed and the table contains 10 million rows, I build the indexes.  I added the single letter stopwords to avoid indexing initials in names (John Q Public), which helped reduce the time.  This is the process that is taking 10+ hours:


 dbisamtable1.Close;
 dbisamtable1.Exclusive := True;

  with dbisamtable1 do
  begin
           StopWords:=TStringList.Create;
           try
              with StopWords do  // eliminate initals from the full text search
                 begin
                 Add('A');
                 Add('B');
                 Add('C');
                 Add('D');
                 Add('E');
                 Add('F');
                 Add('G');
                 Add('H');
                 Add('I');
                 Add('J');
                 Add('K');
                 Add('L');
                 Add('M');
                 Add('N');
                 Add('O');
                 Add('P');
                 Add('Q');
                 Add('R');
                 Add('S');
                 Add('T');
                 Add('U');
                 Add('V');
                 Add('W');
                 Add('X');
                 Add('Y');
                 Add('Z');
                 end;
              AlterTable(0,1,0,False,'','',4096,512,0,'TITLE',StopWords,
                    #1+#2+#3+#4+#5+#6+#7+#8+#9+#10+#11+#12+#13
                    +#14+#15+#16+#17+#18+#19+#20+#21+#22+#23
                    +#24+#25+#26+#27+#28+#29+#30+#31+#32+'*+'+
                    ',-./:;<=>\`',
                    '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_ab'+
                    'cdefghijklmnopqrstuvwxyz'+
                    ''+
                    ''+
                    '',True);
           finally
              StopWords.Free;
           end;
  end;


Any ideas as to what I might be doing wrong?  I was guessing somewhat on the run time build of a full text index, since I could not find any sample code.  I also tried sql:

//  dbisamquery1.SQL.Clear;
//  dbisamquery1.SQL.Add('alter table FV00020621 TEXT INDEX (TITLE);');
//  dbisamquery1.ExecSQL;


But that was no faster.  Is it possible that the unicode stuff that might have been added since that version of DBISAM is causing me this grief?

Thank you for any ideas, suggestions.

-Jim
Tue, Jan 1 2013 4:58 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Jim


>But that was no faster. Is it possible that the unicode stuff that might have been added since that version of DBISAM is causing me this grief?

I switched to ElevateDB some while back so I'm not up to date with DBISAM but I thought it was still non-unicode. D7 certainly is so I doubt that it "unicode stuff" that is causing the problem.

Looking at your code it couldn't get much simpler, and the only suggestions I could make are to reduce the NewTextIndexIncludeChars to a-z, A-Z, and the NewTextIndexSpaceChars to #32,'.' and ',' - unless you're likely to encounter others in a name.

I don't think that will give you the speed up you want though. You could also try playing with the buffer sizes, that may gain you some speed.

Also, if you remember it, and are on a support contract, you could try downloading and installing the earlier version of DBISAM, compile your utility and then reinstall the latest version.

Finally is there a possibility that you'd written your own engine function for full text indexing?

Roy Lambert [Team Elevate]
Tue, Jan 8 2013 12:04 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Jim,

<< The build of the full text index is taking over 10 hours.  This is the
(throwaway) create table code: >>

Did you try creating the table with the text index(es) first, and then
populating the table (opened exclusively) ?

Tim Young
Elevate Software
www.elevatesoft.com
Image