Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 23 total
Thread How to speedup random locate or RecNo positioning ?
Thu, Sep 15 2011 11:59 AMPermanent Link

Maxim V. Terentiev

Hi,

I have simple table:

ID (AutoInc, unique Primary Index)
SomeData (String[250])

Table contains about 2 millions records.

I must read ALL records from table but in random order. Each record must be readed only one time.

But I notice very poor performance if Locate called for random records !

For example simple test like this:

for(int i=0;i<5000;i++)
    MyTable->Locate("ID",Random(2000000),LocateOpts);

Take about 30 seconds on my system (Core Duo 3.3 ghz, 4 gb ram, 1.5 tb HDD) . So, it's about 160 locates/sec.

I also try FindKey and positioning by using MyTable->RecNo but results is almost same.

non-random locate test like this:

for(int i=0;i<5000;i++)
    MyTable->Locate("ID",i,LocateOpts);

Take only 1 second on my system.

It's possible to improve random-locate speed ? I need at least 800 locates per second. Maybe I need to change some buffers settings ?

Thanks for help and excuse me for bad english.
Fri, Sep 16 2011 6:41 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Maxim


If its fast when its sequential then I'd suspect its down to Windows buffering. You can try altering the buffers for the table. From the manual

Memory Buffer Customizations
The TDBISAMEngine MaxTableDataBufferCount, MaxTableDataBufferSize, MaxTableIndexBufferCount,
MaxTableIndexBufferSize, MaxTableBlobBufferCount, and MaxTableBlobBufferSize properties allow you to control how
much memory is used for buffering the data records, index pages, and BLOB blocks for each physical table opened in
a given session in the engine. The *Size properties dictate how much memory, in bytes, to allocate. The *Count
properties dictate the maximum number of data records, index pages, and BLOB blocks that can be allocated
regardless of the amount of memory available. This is to ensure that the buffering architecture in DBISAM does not
get overwhelmed by buffering too many small records, etc.


But I suspect it won't do much good unless its set very high because of skipping around the table will mean that the buffers will rarely hold useful data.

What about trying SQL. You could define a list of random IDs and do a SELECT * FROM table WHERE ID IN (randomlist). May not help but its worth a try.

Roy Lambert [Team Elevate]
Fri, Sep 16 2011 12:05 PMPermanent Link

Maxim V. Terentiev

Hi Roy,

I was try to increase MaxTableIndexBufferCount, MaxTableIndexBufferSize, etc but it's not helps.

I also try to use FindKey, SetRange, Filter (ID=SomeRandomNumber), SELECT * FROM MyTable WHERE ID=SomeRandomNumber but speed is almost same - 100-150 locates/sec.

Only one thing speed up random locates for a little: full index compression...
Fri, Sep 16 2011 1:16 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Maxim


With random access such as you're trying about the only thing that will really help is to get the whole thing into memory. RAM shouldn't be a problem with the structure you posted but loading the whole table into memory will take some time. You may be able to do something with a thread loading the table into memory in parallel with the random access. Something along the lines of:

start the thread when the program starts
create an in memory table
start loading records into memory

when you need a record first try and load from memory, if not there load from disk


Roy Lambert [Team Elevate]
Mon, Sep 19 2011 11:18 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Maxim,

<< It's possible to improve random-locate speed ? I need at least 800
locates per second. Maybe I need to change some buffers settings ? >>

Your only two options are those already discussed: increase the memory
buffering settings and/or enable index compression to try and reduce the
size of the index(es) being used.

If you want to email me a sample project that replicates what you're trying
to do (with data, please), I can take a look and see if there's anything
else I can recommend.

--
Tim Young
Elevate Software
www.elevatesoft.com
Mon, Sep 19 2011 1:17 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Maxim


Now that Tim's joined in and confirmed my suspicions I have a further question. Can you describe what you're trying to do and why you need this? Someone may be able to suggest an alternative.

Roy Lambert [Team Elevate]
Tue, Sep 20 2011 9:31 AMPermanent Link

Maxim V. Terentiev

Roy Lambert wrote:

Now that Tim's joined in and confirmed my suspicions I have a further question. Can you describe what you're trying to do and why you need this? Someone may be able to suggest an alternative.
----------------------------------------------------------------------------------------------------------------------------------------------

Ok, my app is email sender, I use DBISAM for mailing lists. Each list have at least 3 fields (ID, Email, Name) but user can add any number of custom fields and import data to them.

User can load any number of records to mailing lists (up to few millions records).

But some spam filters not like if mail comes to recipients in alphabetical order, so while sending mail I must get contacts from mailing lists in random order. Each contact must be taken only one time of course.

So, I implement simple algorithm like this:

TStringList *ShuffledList=new TStringList;
// Load ID's from mailing list
for(MailList->First();!MailList->Eof;MailList->Next())
         ShuffledList->Add(MailList->FieldByName("ID")->AsString;
// Shuffle ID's list
int p;
for(int i=ShuffledList->Count-1;i>1;i--)
   {
   p=Random(i);
   ShuffledList->Exchange(p,i-1);
   }
// Now i can take contacts in random order
for(int i=0;i<ShuffledList->Count;i++)
      {
      MailList->Locate("ID",ShuffledList->Strings[i],Opts);
      SendMail();
      }

And I have problem with last Locate due to very hard HDD usage. This is a bottleneck.

Loading all records to RAM is not good solution because as I explain before Mail lists may contain any number of fields (including MEMO and BLOBs). I need to access all fields because user can insert data from any field to message body or attachments during mailing (MailMerge). So, it's will take to many RAM and load process can be to long.

Looks like only solution in my case is partial randomization:

TStringList *tmpList=new TStringList;
for(int i=0;i<1000;i++)
      tmpList->Add(ShuffledList->Strings[i];
tmpList->Sorted=true;
for(int i=0;i<tmpList->Count;i++)
       {
       MailList->Locate("ID",tmpList->Strings[i],Opts);
       SendMail();
       }

In this case Locates will travel table only in forward direction, it's up to 6-7 times faster than fully random mode.
But results only partial randomized Frown

Another solution is adding additional indexed field to MailList for storing some integers in random order. At mailing time I must only sort list by this field. But here is two problems:

1. How to add unique random digits to this field ? Duplicates is not allowed.
2. MailList->Next() will be also slow in this case. Maybe fasten then Locate but slow enough.

Any suggestions ?
Tue, Sep 20 2011 9:36 AMPermanent Link

Maxim V. Terentiev

I forgot to say about my goal:

I need to take at least 500 contacts per second on typical user PC or laptop (1.8 ghz CPU, 1 gig ram, 500 gb HDD). 1000 contacts per second will be very nice....
Tue, Sep 20 2011 11:13 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Maxim


I don't know if you've tried benchmarking the various parts of the operation but with the spec of machine you quote, I have a feeling that mailmerge and sendmail will be bottlenecks as well.

Roy Lambert
Tue, Sep 20 2011 11:17 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Maxim


What is the data loaded from? How is the data loaded, and how long does that take?

Roy Lambert
Page 1 of 3Next Page »
Jump to Page:  1 2 3
Image