Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 8 of 8 total
Thread A replication strategy that will work!
Tue, May 31 2011 1:07 AMPermanent Link

David Cornelius

Cornelius Concepts

Avatar

Ever since replication was added in EDB 2, I've been trying to get it to
work well for an application where every user is in a different physical
location.  It's not the fault of the database engine, but my learning curve.
Plus, I've had a couple of false starts and haven't been able to devote full
time to it, so this has stretched for a long time and I have been
interrupted multiple times with other application needs such as report
changes and such.

The typical example of replication has a core office group working in a
client-server environment on a LAN with a handful of remote sales people
adding quotes.  As long as remote users are either adding records or not
editing records that other users are editing, the potential problems are
few.

In my customer's situation, their expectation is that everyone should have
nearly instant access to anyone else's changes or additions.  I gave that to
them by forcing them to always work connected remotely over the internet to
the central database.  But they also want the speed of browsing and
reporting that you get with a local database in addition to having all their
information automatically synchronized so that if they are disconnected from
the internet, they can still run any report.

Like I said, I've had some false starts, including duplicate records, lost
changes when two users edited the same records, lock-outs, and even missing
updates.  But the mechanism I have finally come up with and have
successfully tested consists of 3 main parts: 1) remote locking, 2) local
editing, and 3) a very frequent replication schedule.  (I'm not worried
about adding records because I'm using GUIDs for primary keys.)

Every synchronized table has a boolean "Locked" field that defaults to
False.  When a user requests a record to edit, the first thing that happens
is it tries to set that "locked" field in the record on the remote database
that needs to be changed.  If the remote database is unavailable or if that
field is already True, then the edit fails with an appropriate message.
Otherwise, the remote Locked field is set to True and local editing
commences.  When local editing is finished, the local Locked field is set to
False and the record is posted.  Replication takes care of the rest.

Replication (LOAD UPDATES/SAVE UPDATES) happens frequently on the
client--I've set it to every 30 seconds.  So in less than a minute, the
saved changes have been sent to the server.  The user continues working in
the application, while the server record is still locked for editing.  But
the server's replication schedule is every minute so it soon picks up the
changed record from the client which has the cleared Locked field and the
changes are available to everyone else and is once again available for
editing by anyone.

I had not at first implemented such frequent updates because the pile-up of
update files would get so large as to create a bottleneck when a user
reconnects and has to get several hours or even days worth of updates.  But
with EDB 2.05 and the new SAVE UPDATES clause "IF NOT EMPTY" I no longer
have to worry about that.

This, I believe, will finally give the customer what they want, quick access
to updated data, and keep problems due to remote multiple users to a
minimum.


I would like to know if others have implemented replication schemes and how
they are handled--or if anyone has any comments or questions to the solution
I've detailed above.


David Cornelius
Cornelius Concepts
Tue, May 31 2011 9:33 AMPermanent Link

Adam Brett

Orixa Systems

Thank you for this ... it is good to know others are working on it!

I have 1 system working with "replication" ... but actually the dataset is very unusual, so actually there can't really be issues of record-conflict.

I have 63 sites each of which post in daily to a central server with their "work for that day" (basically goods received & dispatched) The thing is each one only changes _their_ data ... they don't really look at each others data.

The central office then reviews this data a few days later ... they will then update & change it, but by that time the remote offices are no longer touching it, so again no conflicts.

I publish only a small part of each remote DB, so updates are only on those bits they need to share. Each remote office has their own (not shared) function to return keys which are just integers.

--

I do have some problems: Basically if some meta data changes & these changes need to be published across the whole network (i.e. a list of products, vehicles or personnel) At present I am keeping this data in the control of the central office & periodically sending it "down" to the depots. However I can see that this is not really the best solution.

Something similar to what you have done, allowing the remote user to lock the central record could be useful in that case.

However, that would mean the application linking to 2 databases simultaneously (am I right?) 1 local & 1 remote & locating the same record on each ... that sounds quite a big leap from my current data-management, which is pretty dumb.

What is great is that EDB actually can do it & make it work ...
Wed, Jun 1 2011 1:17 AMPermanent Link

David Cornelius

Cornelius Concepts

Avatar

Your situation sounds much simpler than mine.  I envy you!  Wink

Yes, my application does connect to both the remote and the local database
simultaneously to get this to work.  But the stuff I do on the remote is
fairly minimal--just simple lookups and single field updates.  The rest is
all done locally and handled by the replication in a separate thread.  I'm
pretty happy with how it's turning out.

Changing the remote user's database to be in-sync with the central one is
indeed a challenge.  For now, I plan to ship local database updates out with
application upgrades and make sure the central database is always one step
ahead, but backwards compatible.

David Cornelius
Cornelius Concepts
Wed, Jun 1 2011 5:06 AMPermanent Link

Adam Brett

Orixa Systems

I have other applications I have to maintain which may require more complex replication in the future ... so don't envy me too much!

My central database contains "all" the data, and is periodically quiet (the system controls a business which harvests crops for 2 long periods each year but is then inactive). at the end of each season all accounts have to be closed off & finalized. Once this is done I download the central database to each depot, so they have updates to things like staff lists etc. Luckily these things change slowly so it is possible to work around the changes during the season. Sort of ...

a "full replication" system would be better, at present users to have to work around the system a bit.

I am curious about your experience with GUIDs:
- Do they slow things down?
- Do you experience any problems with them?
- Do you add any other "manual keys" to tables for users to refer to if they need a reference number?

GUIDs seem like a golden bullet when it comes to ensuring referential integrity with a disconnected multi-site database, but I have heard mixed messages about them.

Also all my code is based on incrementing integer IDs ... so it would be quite a big step to recode for GUIDs.

As I usually only have a fixed number of sites I was hoping to stick with the idea of each site having a Generator table with a starting number plus an increment call & all IDs being drawn from that generator, similar to what I use now, but slightly more automated.
Thu, Jun 2 2011 1:43 AMPermanent Link

David Cornelius

Cornelius Concepts

Avatar

<<I am curious about your experience with GUIDs:
- Do they slow things down?>>

Not in my experience.  With today's speedy computers, megabytes of RAM, and
high-speed internet, I think it negligible.

<< - Do you experience any problems with them?>>

None at all.  At least not with the GUIDs themselves.  But I used them
unconventionally in one situation.  Each user of the application has a
separate EDB Store on the server where they get their replication updates.
Since everything is keyed by GUID instead of number or name, I of course
wanted to create the stores named by the GUID.  But the store names can't
have {} in them, so I had to strip those out.

<< - Do you add any other "manual keys" to tables for users to refer to if
they need a reference number?>>

The users refer to documents and customers and vendors by a coding scheme
they were using in an old FoxPro for DOS before I wrote this "new windows
program" for them, so that was already in place.  Before I switched to
GUIDs, I was using AutoInc IDs that they never saw either.  So switching to
GUIDs didn't change that aspect.

I've always had indexes on other fields that are used to reference
documents.  And some of those are unique constraints.

<< GUIDs seem like a golden bullet when it comes to ensuring referential
integrity with a disconnected multi-site database, but I have heard mixed
messages about them.>>

It worked for me and I'm very happy.

<< Also all my code is based on incrementing integer IDs ... so it would be
quite a big step to recode for GUIDs. >>

It was for me as well.  I blogged about the experience of converting:
http://corneliusconcepts.com/mega_update_1 and
http://corneliusconcepts.com/mega_update_2.  Part 3 still needs to be
written.


David Cornelius
Cornelius Concepts








As I usually only have a fixed number of sites I was hoping to stick with
the idea of each site having a Generator table with a starting number plus
an increment call & all IDs being drawn from that generator, similar to what
I use now, but slightly more automated.
Sun, Jun 5 2011 8:30 AMPermanent Link

Adam Brett

Orixa Systems

Thanks for all these useful pointers ... ugh I think I am going to have to bite the bullet & switch to GUIDs at some point, at least for my replication strategy ... my users love their IDs (there was no prior FoxPro version! There was a BDE version, but even that used IDs!)

My problem is that it is a bit of an "all or nothing" situation. Right now users know they can only work in the offices (where there is a LAN) & that the LAN will manage the updates to the central office.

People want to be able to work from home or on the road ... those I use Remote Desktop ... which is a bit clunky, but OK. It means leaving PCs switched on in the offices & sometimes people can't get a good enough connection to use it. Remote Desktop also has costs (right version of Windows, good broadband connection, users must have 2 PCs!).

The thing is if I move from my current system I will basically need to cope with "n" users ... it might be that _every_ staff member at _every_depot will want to work disconnected from time to time. That means a lot of clever programming in the application to test for where the PC can connect, etc.

Part of me is hoping that board-band becomes fast enough so I can move _all_ connections to a single central database, with all users calling that single server. That would actually be a significant simplification ...

Thanks for the links to the blog ... do write part 3!
Sun, Jun 5 2011 4:25 PMPermanent Link

David Cornelius

Cornelius Concepts

Avatar

All the contacts in my application have a number that is established by
users of the application when the contact is created.  It is a unique
constraint in the table, but not the key (the GUID is).  In order to keep
from tripping over each other if two people create a contact at the same
time and don't communicate about what number they are assigning, I have one
table on the server that contains a bunch of system-wide settings, one of
which is the last contact number assigned.

When a new contact is requested, the first thing that happens it that table
on the server is queried for the next available number and immediately
incremented.  That new number is returned to the client (local or remote)
and not used by anyone else (transactions come in handy here!).

So you might still be able to keep the IDs as visible numbers for your
users, and may in fact still be fine for your application if managed
properly (mine were not).  Or add GUIDs as an invisible new key field.

Clever programming definitely comes in handy.  My application always assumes
a local database and allows edits only if the server database is available.
Some tables, like the contacts that use a unique ID number, only allow new
records when the server is available.  Other records where there is no
specific number associated can be added, but not deleted or edited.

There are a lot of things to keep in mind: local database path, remote
connection parameters, database signatures, encryption passwords, timeout
settings, store names, replication frequency, refreshing datasets,
preventing lockouts and duplicates, and definitely educating the user on
expectations!  Forcing everyone to connect to one central database to
eliminate replication is an order of magnitude simpler, but much slower for
the user when browsing a grid of data or generating reports.

Have fun!

David Cornelius
Cornelius Concepts
Wed, Dec 21 2011 5:57 PMPermanent Link

Jim Garrity

SDS

"David Cornelius" wrote:

Ever since replication was added in EDB 2, I've been trying to get it to
work well for an application where every user is in a different physical
location.  It's not the fault of the database engine, but my learning curve.
Plus, I've had a couple of false starts and haven't been able to devote full
time to it, so this has stretched for a long time and I have been
interrupted multiple times with other application needs such as report
changes and such.

The typical example of replication has a core office group working in a
client-server environment on a LAN with a handful of remote sales people
adding quotes.  As long as remote users are either adding records or not
editing records that other users are editing, the potential problems are
few.

In my customer's situation, their expectation is that everyone should have
nearly instant access to anyone else's changes or additions.  I gave that to
them by forcing them to always work connected remotely over the internet to
the central database.  But they also want the speed of browsing and
reporting that you get with a local database in addition to having all their
information automatically synchronized so that if they are disconnected from
the internet, they can still run any report.

Like I said, I've had some false starts, including duplicate records, lost
changes when two users edited the same records, lock-outs, and even missing
updates.  But the mechanism I have finally come up with and have
successfully tested consists of 3 main parts: 1) remote locking, 2) local
editing, and 3) a very frequent replication schedule.  (I'm not worried
about adding records because I'm using GUIDs for primary keys.)

Every synchronized table has a boolean "Locked" field that defaults to
False.  When a user requests a record to edit, the first thing that happens
is it tries to set that "locked" field in the record on the remote database
that needs to be changed.  If the remote database is unavailable or if that
field is already True, then the edit fails with an appropriate message.
Otherwise, the remote Locked field is set to True and local editing
commences.  When local editing is finished, the local Locked field is set to
False and the record is posted.  Replication takes care of the rest.

Replication (LOAD UPDATES/SAVE UPDATES) happens frequently on the
client--I've set it to every 30 seconds.  So in less than a minute, the
saved changes have been sent to the server.  The user continues working in
the application, while the server record is still locked for editing.  But
the server's replication schedule is every minute so it soon picks up the
changed record from the client which has the cleared Locked field and the
changes are available to everyone else and is once again available for
editing by anyone.

I had not at first implemented such frequent updates because the pile-up of
update files would get so large as to create a bottleneck when a user
reconnects and has to get several hours or even days worth of updates.  But
with EDB 2.05 and the new SAVE UPDATES clause "IF NOT EMPTY" I no longer
have to worry about that.

This, I believe, will finally give the customer what they want, quick access
to updated data, and keep problems due to remote multiple users to a
minimum.


I would like to know if others have implemented replication schemes and how
they are handled--or if anyone has any comments or questions to the solution
I've detailed above.


David Cornelius
Cornelius Concepts

I have implemented a replication scheme and have been using it for about a year or so.

My users in some cases have full time remote offices as well as laptops that have to operate offline.

Since one of the critical issues is avoiding duplicate primary keys, I considered various schemes including GUIDs
and ended up with an integer that is incremented by a method that adds a Site ID as the leading digit of the ID. There are two advantages, 1. The Key is always unique and 2) the remote site that created the record can be determined by the record key.

Most users synch on an hourly basis but I would like to try David's strategy of every 30 secs and then every minute on the Main Server.  

I am also working on an updating scheme where the main server would download files from out site and the remote servers would download from the main server so that we don't have a Server Version mismatch or a application version mismatch. The Clients will download from their respective servers. This is pretty much what Tim outlined else where.

Jim Garrity
SDS
Image