Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 9 of 9 total
Thread Ability to segment Update files by a maximum size.
Fri, Sep 18 2015 12:20 PMPermanent Link

Adam Brett

Orixa Systems

The call to

SAVE UPDATES FOR DATABASE ****

produces an update file of an unbounded size.

Replication processes regularly involve transferring these files over wide area networks, via the internet, and / or using EDBServer to run jobs and procedures.

A major update to a large database can result in Update files which are very large (100s of megabytes), as every row-change is saved as a full SQL Statement, which is quite "bulky" compared to EDB's usually small size.

Transferring these files is hard as connections can easily drop or time-out.

A solution would be to generate several smaller files, perhaps sized to a sensible "safe" largest size the network can carry. The SAVE UPDATES statement would be changed to incorporate ways picking the "safe size", naming and dating these smaller files so their updates could be applied in the correct order.
Fri, Sep 18 2015 12:37 PMPermanent Link

Raul

Team Elevate Team Elevate

On 9/18/2015 12:20 PM, Adam Brett wrote:
> Transferring these files is hard as connections can easily drop or time-out.
> A solution would be to generate several smaller files, perhaps sized to a sensible "safe" largest size the network can carry. The SAVE UPDATES statement would be changed to incorporate ways picking the "safe size", naming and dating these smaller files so their updates could be applied in the correct order.

I believe update size control suggestion is in the wish list already -
Tim can confirm.

However you do have some control over the update size in few ways :

1. call SAVE UPDATES for a single/few table instead of whole database -
bit more file management but can make a big difference

2. try compression level 9 (might not help enough since default is 6
already but still)

3. Depending on how your upgrade process works you could try to call
"save updates" as part of that upgrade process (after every major change
or table or even X records or such ).

Raul
Sat, Sep 19 2015 7:21 AMPermanent Link

Matthew Jones

Adam Brett wrote:

> A solution would be to generate several smaller files,

Is it? Key here is surely what is the purpose of the process? I presume
the idea is to take all changes to one database, and then be able to
replicate them to another copy, exactly. If you have a partial snapshot
based on only a sub-set, then when it gets to the other end, you have
not got a coherent copy without the other sections. If the "invoice"
table is updated in the first block, but not the "customer" table, then
you could have referential problems. Even more so if the last table is
the "next invoice number" table.

So you are basically wanting to have the whole update chopped into
chunks, and that is something that there are better solutions for. Why
not zip the file into chunks? They are verifiable on the other end
before being applied. Key is, the choice is down to you, and not forced
on anyone who needs something different, or doesn't have the problem.

FWIW, one of my clients uses a transfer service I wrote, which can
transfer gigabytes of data. It does so by chunking it itself, and
rebuiding at the far end.

Just my two cents. 8-)

--

Matthew Jones
Sat, Sep 19 2015 10:49 AMPermanent Link

Adam Brett

Orixa Systems

Thanks for this Matthew

>>If the "invoice"
>>table is updated in the first block, but not the "customer" table, then
>>you could have referential problems.

The EDB Update process saves a set of update statements to disk in the update files. These are always segmented, as you can randomly call "SAVE UPDATES" on the database at any time, even when users are active. You are right that an "Invoice" might be added without (say) Items (or Customer), but these would follow on the next UPDATE.

I'm not sure exactly how Tim gets it all to work, but I guess perhaps sets of SQL are added to the UPDATES, so that you don't end up with hanging records that are missing Foreign Keys.

In any case, the UPDATE process involves working with lots of files of different sizes already.

>>Why
>>not zip the file into chunks?

That is a great idea, but I'd love the feature to be built into the product, rather than being something I have to engineer myself (that might sound lazy of me ... but actually with something as important as this I would really like it to be used / developed in a commercial product, my customers might get nervous about it otherwise).

The ideal would be for EDB to take a big update, and pass over XXX small EDBTMP files, and for the engine to incorporate these back into an EDBUPD once they had all arrived.

If the connection was lost, the next COPY process could pick up where the failed on had left off.

>>FWIW, one of my clients uses a transfer
>>service I wrote, which can
>>transfer gigabytes of data.

... that sounds really interesting & like it might offer a real fix ... any change of revealing any details?
Just my two cents. 8-)

--

Matthew Jones
Sat, Sep 19 2015 10:55 AMPermanent Link

Adam Brett

Orixa Systems

Thanks Raul

>>I believe update size control suggestion is in the wish list already -
>>Tim can confirm.

Ashamed to say I might even have put it there ... apologies if I did and had forgotten

>>1. call SAVE UPDATES for a single/few table instead of whole database -
>>bit more file management but can make a big difference

>>2. try compression level 9 (might not help enough since default is 6
>>already but still)

>>3. Depending on how your upgrade process works you could try to call
>>"save updates" as part of that upgrade process (after every major change
>>or table or even X records or such ).

Good suggestions: I will think about them. I do like being able to have the update process as a relatively simple, separate process to the rest of the DB operations, following suggestion 3. would couple the DB and Update process which feels a bit hacky.

An issue is that a single statement can generate a really massive UPDATE file.

The reason for my post is a customer who updated a DB with about 300,000 Contact records in it, updating the "Name" field for every person ... this resulted in one hell of an update file.

You are right, if I broke every transaction on the DB into chunks, and did a "save" after every XXX chunks I could get small UPDATE files ... though I'm not sure how to do this.

I actually think it might be best to be able to have EDB do this automatically i.e. a SQL syntax like:

"SAVE UPDATES EVERY 1 MEGABYTE"

Would be great ...
Mon, Sep 21 2015 4:29 AMPermanent Link

Matthew Jones

Adam Brett wrote:

> > > FWIW, one of my clients uses a transfer
> > > service I wrote, which can
> > > transfer gigabytes of data.
>
> .. that sounds really interesting & like it might offer a real fix
> ... any change of revealing any details?  Just my two cents. 8-)

At one time it was quite generic, but over the years it is very custom
now, so wouldn't be sensible. It is quite simple though, as it pops the
files it finds into a DBISAM database (that's the age of it!), then
uses RemObject SDK to send segments until it gets to the other side.

I'm looking at releasing my framework sometime, so maybe a file
transporter might be a good example...

--

Matthew Jones
Mon, Sep 21 2015 4:30 AMPermanent Link

Matthew Jones

Matthew Jones wrote:

> I'm looking at releasing my framework sometime, so maybe a file
> transporter might be a good example...

That said, I just realised I use SyncBack SE for this purpose myself,
run on an hourly schedule. It can handle retries etc. over FTP (and
others).

--

Matthew Jones
Mon, Sep 21 2015 9:33 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Adam,

<< A major update to a large database can result in Update files which are very large (100s of megabytes), as every row-change is saved as a full SQL Statement, which is quite "bulky" compared to EDB's usually small size.

Transferring these files is hard as connections can easily drop or time-out.

A solution would be to generate several smaller files, perhaps sized to a sensible "safe" largest size the network can carry. The SAVE UPDATES statement would be changed to incorporate ways picking the "safe size", naming and dating these smaller files so their updates could be applied in the correct order. >>

The only issue preventing this is that EDB doesn't currently mark the transaction boundaries in the logged updates, so that will need to be added first.  Otherwise, I could add this rather quickly.  However, if I can figure out a way to add the transaction boundaries without causing issues with older versions, then things will be good to go.

Tim Young
Elevate Software
www.elevatesoft.com
Sun, Sep 27 2015 5:24 AMPermanent Link

Uli Becker

Adam,

another way to handle updates better, is to use a module and zip/unhzip
the update files.

I use that in an application with a large number of small update files
(updates are received and sent every 5 minutes). The reason though was
to speed up the transfer (sending one big zip-file is much faster than
sending many small files).

Maybe a solution like that would help you.

Uli
Image