Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 20 of 23 total
Thread How to speedup random locate or RecNo positioning ?
Tue, Sep 20 2011 11:23 AMPermanent Link

Maxim V. Terentiev

Roy Lambert wrote:

What is the data loaded from? How is the data loaded, and how long does that take?
--------------------------------------

User can import data to mailing lists from any soureces: txt, csv files, Excel xls, any database via ODBC connection.

Loading speed depends on input format and number of fields. But it's fast enough. About 5 minutes for 1 million records.
Tue, Sep 20 2011 11:31 AMPermanent Link

Raul

Team Elevate Team Elevate


That's what i would do - additional integer field filled in random number and indexed and then just do table scan from start to end based on that field. Generating random values and inserting when data is loaded should be straightforward.

I don't see why you cannot have duplicates as long as there are reasonably few - all duplicates result in is that those records are sorted and sent together - the end goal is for spam filter not to see alphabetical sent to: list so that should still accomplish it.

You could use GUID but then performance woudl likely suffer somewhat - test it out.

The other suggestion i have is to run this in multiple threads with each one processing subset of record - you need to come up with a divide algorithm but it might help with processing.

Raul

<<
Maxim V. Terentiev wrote:

Another solution is adding additional indexed field to MailList for storing some integers in random order. At mailing time I must only sort list by this field. But here is two problems:

1. How to add unique random digits to this field ? Duplicates is not allowed.
2. MailList->Next() will be also slow in this case. Maybe fasten then Locate but slow enough.

Any suggestions ?
>>
Tue, Sep 20 2011 11:41 AMPermanent Link

Maxim V. Terentiev

Roy Lambert wrote:

I don't know if you've tried benchmarking the various parts of the operation but with the spec of machine you quote, I have a feeling that mailmerge and sendmail will be bottlenecks as well.
------------------------------------------------
No, mailmerge is very fast, sendmail can be bottleneck only in case of large message size (large images or attachments). Also, this routines is well optimized and profiled during years of development...

I thinking not only about my app. In case of random locate read I have overall system performance hit due to heavy HDD usage. All programs run slower, system responce is slower. It's not very good if user can't do anything except sending mail using my app Smile

Also, my app allow to run many independed mailings in one time (different messages to different mailing lists). In this case performance will be absolutely terrible...
Tue, Sep 20 2011 12:24 PMPermanent Link

Maxim V. Terentiev

Raul wrote:

I don't see why you cannot have duplicates as long as there are reasonably few - all duplicates result in is that those records are sorted and sent together - the end goal is for spam filter not to see alphabetical sent to: list so that should still accomplish it.
---------------------------------------------

User can stop mailing and continue it later. Also, user can modify mailing lists (add/edit/remove any records) while mailing is stopped.

So, in case of duplicates I can't determine last stop position because record where mailing has been stopped can be removed from maillist !

Let me explain. For example we have this mailing list:

ID (AutoInc)         Email              RandomID (indexed and sorted by this field)
1                       z@mail.com             15
4                       a@mail.com             24
6                       f@mail.com              24     <--- Mailing has been stopped here.
9                       d@mail.com             24
22                     x@mail.com             28
12                     g@mail.com             75

What if record with ID=6 has been deleted while mailing is stopped ? From what record I must continue mailing ?

At this time for AutoInc ID field I can write:

if(!MailList->Locate("ID",LastStopID,Opts))
         MailList->FindNearest(LastStopID);

And it's works well because if some records deleted - remaining (or new records) will always have ID larger than LastStopID. So, first record where ID>LastStopID is correct start position.

But it case of duplicates results will be incorrect ! I can't call FindNearest for ID field because list sorted by RandomID. And I can't call FindNearest for RandomID because if we have dupes result can be incorrect.
Tue, Sep 20 2011 1:29 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Maxim


That's a bit surprising but good. I'm thinking somewhat along the same lines as Raul, but different.

1. Dump the autoinc column giving a unique index - its doing you no good at all.
2. Replace it with an integer column (RandomNo)
3. When loading use RANDOM (at least that's what it is in Delphi) to stuff a randon value into RandomNo
4. When the table is loaded
    a) Create an index on RandonNo
    b) OPTIMISE the table using that index
5. When you do your eShot you can just start at the top and traverse the table using Next

It doesn't matter if RandomNo contains duplicates since all you want is to randomise the names so it doesn't matter if Smith is 47 and sits next to Lambert who is also 47

You can also as Raul suggested use threads, but with the spec of machine I'm not sure it will give you much. If you do I'd suggest no more than 2 start one at the top and use Next and the other at the bottom and use Prior. Pass in a count (half the number of records) to each thread, decrement on each read and stop when it reaches 0


No matter what you do if the hard drive is being hammered then its going to affect the performance of other programs on the PC


Roy Lambert [Team Elevate]
Tue, Sep 20 2011 1:42 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Maxim


What were you using before DBISAM? This bit "Also, this routines is well optimized and profiled during years of development..." of your other post indicates this has been going for a while.

Each bit of new information you post makes me wonder if DBISAM is the right tool for this job. Much as I hate to send someone away from ElevateDB have a look at http://blog.synopse.info/post/2010/03/16/Synopse-Big-Table

Roy Lambert [Team Elevate]
Tue, Sep 20 2011 2:44 PMPermanent Link

Maxim V. Terentiev

Roy Lambert wrote:

What were you using before DBISAM? This bit "Also, this routines is well optimized and profiled during years of development..." of your other post indicates this has been going for a while.
-------------------------------------------------------------

I use DBISAM 5 years in all my projects Smile
And it's works fine until users of my app ask me about list radnomization....

Roy Lambert wrote:
Each bit of new information you post makes me wonder if DBISAM is the right tool for this job. Much as I hate to send someone away from ElevateDB have a look at http://blog.synopse.info/post/2010/03/16/Synopse-Big-Table
-------------------------------------------------------------

I found it already but can't use because my app written in BCB. Synopse BigTable cannot be compiled in BCB because it's uses unsupported by C++ pascal 'Object' structure. It's requires many changes in BigTable sources... I already working on conversion.
Tue, Sep 20 2011 2:54 PMPermanent Link

Maxim V. Terentiev

Roy Lambert wrote:

1. Dump the autoinc column giving a unique index - its doing you no good at all.
2. Replace it with an integer column (RandomNo)
3. When loading use RANDOM (at least that's what it is in Delphi) to stuff a randon value into RandomNo
4. When the table is loaded
    a) Create an index on RandonNo
    b) OPTIMISE the table using that index
5. When you do your eShot you can just start at the top and traverse the table using Next
----------------------------------------------

No way. How much time user must wait for Create index and Optimize with table contains 2 millions records ? Smile

I think I just found a best solution: I let user to decide what mode (random or non-random) must be used Smile
I will write some explain about disk usage... Maybe I will use BigTable as temporary storage later...

Anyway thank you very much for all suggestions you give me !

DBISAM is really cool anyway. One of my app is multi-user Client-Server DB with 40 users and millions records in tables. And it's works great and stable last 3 years in 24/7/365 mode Smile

Only one disapoint me: table and inexes size (it's toooo large) and some performance problems with large datasets. But almost always it's possible to find a way to optmize it !
Tue, Sep 20 2011 3:59 PMPermanent Link

Raul

Team Elevate Team Elevate


If the performance of the table scan using int value is acceptable then how about just ensuring you have a unique random value for all rows?

Use something like largeint as data type and unixtime as the value - assuming your import is faster than resolution of unixtime you can use 32bits as unixtime and then use other 32bits as some additional value (for example your autoinc value or even a random one).

Raul


<<
Maxim V. Terentiev wrote:


User can stop mailing and continue it later. Also, user can modify mailing lists (add/edit/remove any records) while mailing is stopped.

So, in case of duplicates I can't determine last stop position because record where mailing has been stopped can be removed from maillist !

>>
Fri, Sep 23 2011 3:29 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Max,

<< But some spam filters not like if mail comes to recipients in
alphabetical order, so while sending mail I must get contacts from mailing
lists in random order. Each contact must be taken only one time of course.
>>

What I would do is add a permanent Integer field, index it, and then
populate it with a RandomRange (Math unit) call that uses the entire range
of Integer values.

You can use your existing autoinc field/index to keep track of the current
mailing position, and to return to it if the mailing is stopped for any
reason.  Just do a FindKey with the active index set to the autoinc field,
and then switch the active index to the random field index.  From that point
on, you just have a normal while (not EOF) do...next loop.

You should get pretty good performance from this, and it should be much
quicker than the randomized Locates.

--
Tim Young
Elevate Software
www.elevatesoft.com
« Previous PagePage 2 of 3Next Page »
Jump to Page:  1 2 3
Image