Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 5 of 5 total
Thread In-Memory Tables vs Sorted String Lists
Wed, Feb 28 2007 5:08 PMPermanent Link

Dave
Hi

I'm currently creating an app and would appreciate some advice on in-memory data to save me some time in testing all the different scenarios.

The data I have is a sorted list of "words" each with a count.

Each list may have 5000-30000 "words" and a user might have 30-300 of these lists.

During my apps usage, I will frequently call a process where each list is checked, and for each check up to 300 words will be searched for in the list.

So obviously speed here is paramount.

Each list is loaded as it is required (though 80% of the lists will probably be loaded initially) and all are saved at the same time on shut-down.


The two ways I would see to do it is:

1.  Each list is held in memory as sorted string list

Data is loaded/saved by being assigned from a memo field of a database (in the database it is stored as word#9count, in memory the word is stored as an item of the string
list and the count as an "object"

In my tests this takes around .4 seconds per list to load (30,000 records, very fast computer), which would be too slow if there are a lot of lists.  Searching is quite fast (50
lists per second).  Saving is almost fast enough.


2.  All lists are stored in one in-memory table

At start up the table is loaded from disk and then copied to a memory table.  To obtain each "list" the memory table is filtered.  At shutdown the memory table is copied to
the disk table.


I haven't yet tested the second option, but I expect it would be faster at loading, as fast at searching, and possibly saving.  Has anyone tried something similar?  Is there
another way to handle this that I have missed (a THashedStringList doesn't seem appropriate and using a separate in-memory table for each list sounds like a headache).

Thanks for any advice,

d
Thu, Mar 1 2007 3:29 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Dave


Biggest case is c9,000,000 words. There've been a number of posts about speed of adding records to DBISAM tables as they grow large (something to do with maintaining indices and stats), so if there is much in the way of inserts point one is it might be better to look at ElevateDB.

Second point is I'd probably just stick with the disk table and use sql to check for the word, I don't think you'd get that much speedup if you have the right indices.

There was a major thread on the CodeGear NG's (borland.public.language.delphi.win32 I think) about searching lists which might help you.

Roy Lambert
Fri, Mar 2 2007 7:18 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dave,

<< Each list is loaded as it is required (though 80% of the lists will
probably be loaded initially) and all are saved at the same time on
shut-down. >>

Are these lists being modified once they are loaded in the app ?

--
Tim Young
Elevate Software
www.elevatesoft.com

Sat, Mar 3 2007 12:21 PMPermanent Link

Sam Davis
Dave wrote:

> Hi
>
> I'm currently creating an app and would appreciate some advice on in-memory data to save me some time in testing all the different scenarios.
>
> The data I have is a sorted list of "words" each with a count.
>
> Each list may have 5000-30000 "words" and a user might have 30-300 of these lists.
>
> During my apps usage, I will frequently call a process where each list is checked, and for each check up to 300 words will be searched for in the list.
>
> So obviously speed here is paramount.
>
> Each list is loaded as it is required (though 80% of the lists will probably be loaded initially) and all are saved at the same time on shut-down.
>
>
> The two ways I would see to do it is:
>
> 1.  Each list is held in memory as sorted string list
>
> Data is loaded/saved by being assigned from a memo field of a database (in the database it is stored as word#9count, in memory the word is stored as an item of the string
> list and the count as an "object"
>
> In my tests this takes around .4 seconds per list to load (30,000 records, very fast computer), which would be too slow if there are a lot of lists.  Searching is quite fast (50
> lists per second).  Saving is almost fast enough.
>
>
> 2.  All lists are stored in one in-memory table
>
> At start up the table is loaded from disk and then copied to a memory table.  To obtain each "list" the memory table is filtered.  At shutdown the memory table is copied to
> the disk table.
>
>
> I haven't yet tested the second option, but I expect it would be faster at loading, as fast at searching, and possibly saving.  Has anyone tried something similar?  Is there
> another way to handle this that I have missed (a THashedStringList doesn't seem appropriate and using a separate in-memory table for each list sounds like a headache).
>
> Thanks for any advice,
>
> d
>

Dave,
    Here is something else to consider. When using a memory table,
there is also the overhead of converting the record field to a local
variable. When the table field is referenced using Field1.AsString or
Field2.AsInteger etc., it takes a few ms and this adds up over time. We
were doing billions of fetches of fields using a well known memory
table, and it wasn't as fast as a TList with an object. You can
reference the fields in an object from a TLIST instantly without any
overhead of converting the field data from the TDataset to a variable.
We were going through the TList sequentially so we didn't need any
lookup or sorting capability and it was quite fast compared to a memory
table.

Sam
Mon, Mar 5 2007 6:05 PMPermanent Link

Dave
Thanks Roy, Tim and Sam,

Roy,

Hmmm, perhaps you are right about sticking with a disk based table (it would be easier to manage too)


Tim,

Yes, after each process (analyzing all the lists), the "words" in one list will be updated.  That is, new words may be added, and the count of existing words may be changed.


Sam,

Thanks, I will consider that.  The problem with using a TList is that i believe loading the data from file to the list would be too slow.


Regards

d

Image