Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 4 of 4 total
Thread Speed of Full Text Indexing
Wed, Mar 14 2007 10:27 AMPermanent Link

Igor Colovic
I have created test application using Trial DBISam database.
My main interest is in Full Text Index. Using Engine.BuildWordList function I have noticed
that it it slow. I do not know why (I do not have source).
I have function that splits 15MB text file in to words in less than 1 second (excluding
loading). I can post it here.

Another thing is encryption. Why is it so slow. If I do not encrypt table, inserting data
(data is ~14000 documents ~1GB of data) is fast ~1-2 min. With encryption ~15 min.

Searching Newsgroups I have found that I can change encryption (use my own). I have tried
this, but there is no speed gain. I am not expert in encryption, but why is engine serving
me with only 8 bytes of data to encrypt. Would it not be more efficient to encrypt larger
blocks of data. Data engine is sending whole blobs for Compression/Decompression. Why not
serve the whole blob for encryption. Using blowfish encryption file of ~15 MB size is
encrypted in 1-2 sec.
Wed, Mar 14 2007 6:38 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Igor,

<< I have created test application using Trial DBISam database. My main
interest is in Full Text Index. Using Engine.BuildWordList function I have
noticed that it it slow. I do not know why (I do not have source). I have
function that splits 15MB text file in to words in less than 1 second
(excluding loading). I can post it here. >>

DBISAM has to perform a lot of callbacks for text index filtering during the
BuildWordList function and it also has to remove duplicate words.

<< Another thing is encryption. Why is it so slow. If I do not encrypt
table, inserting data(data is ~14000 documents ~1GB of data) is fast ~1-2
min. With encryption ~15 min. >>

The encryption doesn't occur on the entire file - each record, index page,
or BLOB block needs to be encrypted separately so that they can be read
separately.  This can be expensive when a lot of I/O is occurring.

<< Searching Newsgroups I have found that I can change encryption (use my
own). I have tried this, but there is no speed gain. I am not expert in
encryption, but why is engine serving me with only 8 bytes of data to
encrypt. Would it not be more efficient to encrypt larger blocks of data. >>

DBISAM is designed around Blowfish, which encrypts data in 8-byte blocks.

<< Data engine is sending whole blobs for Compression/Decompression. Why not
serve the whole blob for encryption. Using blowfish encryption file of ~15
MB size is encrypted in 1-2 sec.  >>

Again, you're thinking in terms of "files", which is not how a database
stores records, indexes, or BLOBs.  They are divided up into blocks, and
these blocks are what is encrypted/decrypted.

--
Tim Young
Elevate Software
www.elevatesoft.com

Thu, Mar 15 2007 5:28 AMPermanent Link

Igor Colovic
"Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote:

<<DBISAM has to perform a lot of callbacks for text index filtering during the
BuildWordList function and it also has to remove duplicate words.>>

Yes I known. The resulting WordList is sorted and there is no duplicate words.
In new test I have used TDBISAMStringList, using LocaleID, Sort and dupIgnore.
Is is still faster. Not much but any gain in speed in a database is a plus.

I have repeated test using all (~1GB) of files. This files in the end are stored in database.
BuildWordList: 14:09
My Test: 10:24

<<The encryption doesn't occur on the entire file - each record, index page,
or BLOB block needs to be encrypted separately so that they can be read
separately.  This can be expensive when a lot of I/O is occurring.>>

DBISAM is designed around Blowfish, which encrypts data in 8-byte blocks.

I know that every field has to be encrypted. The thing is that I am moving one project
from Apollo (xBase) and I am testing DBISAM as a replacement.
Apollo has encryption and it is encrypting whole fields, Blobs ... Why can DBIsam do this.

<<Again, you're thinking in terms of "files", which is not how a database
stores records, indexes, or BLOBs.  They are divided up into blocks, and
these blocks are what is encrypted/decrypted.>>

No. These files are stored in database (memo fields). Every time I want data from blob
field I want the whole data, wright? So why not encrypt whole blob field?

Maybe use 8-byte blocks for every other field but not MEMO/BLOB. MEMO/BLOBS can be
encrypted as whole. This would speed thing up.

What is the size of these blocks? Is DBISAM writing/reading data in 8-byte blocks?
Fri, Mar 16 2007 3:42 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Igor,

<< Yes I known. The resulting WordList is sorted and there is no duplicate
words. In new test I have used TDBISAMStringList, using LocaleID, Sort and
dupIgnore. Is is still faster. Not much but any gain in speed in a database
is a plus.

I have repeated test using all (~1GB) of files. This files in the end are
stored in database.
BuildWordList: 14:09
My Test: 10:24  >>

Again, BuildWordList has to perform callbacks to check for any user-defined
text filtering functionality.  The function call overhead is probably
contributing some time to the process.

<< I know that every field has to be encrypted. >>

Not every field - every record.

<< The thing is that I am moving one project from Apollo (xBase) and I am
testing DBISAM as a replacement.  Apollo has encryption and it is encrypting
whole fields, Blobs ... Why can DBIsam do this. >>

DBISAM does do that.  Also, what kind of encryption is Apollo using ?

<< No. These files are stored in database (memo fields). Every time I want
data from blob field I want the whole data, wright? So why not encrypt whole
blob field? >>

Because the encryption in DBISAM is handled at a lower level than that.   It
is handled at the buffer manager level for all records, index pages, and
BLOB blocks.   Besides, it doesn't really matter - the same amount of data
will need to be encrypted/decrypted.  It doesn't matter when it takes place.

How big are the files that you're storing ?   If they are normally very
large, then you should consider increasing the BLOB block size for the table
from 512 bytes to something larger like 2048 bytes.

<< Maybe use 8-byte blocks for every other field but not MEMO/BLOB.
MEMO/BLOBS can be encrypted as whole. This would speed thing up. >>

No, it wouldn't - see my comments above.

--
Tim Young
Elevate Software
www.elevatesoft.com

Image