Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 3 of 3 total
Thread Finding "Similar" records
Mon, Sep 22 2014 8:27 AMPermanent Link

Adam Brett

Orixa Systems

I have 2 offices which want to try to merge their databases records from a number of tables merged together.

There is a Customers table which has been managed separately, so each time Office "A" added a Customer Office "B" did the same manually.

Of course the 2 offices have also independently added their own records to Customers, so there are plenty of records which are only entered in 1 office, not the other..

About 85% of the records entered in both offices match exactly i.e. OfficeA.Customers.Name = OfficeB.Customers.Name

But a good number (about 2,000) are not exact matches, with a variety of types of difference

i.e.
Office A "The Customer Name" Office B "Customer Name, The"
Office A "Customer, Name" Office B "Customer Name"
Office A "Customer Names" Office B "Customer Name"

Does anyone know of useful techniques to try to improve the number of records I can match automatically?

I will do simple things like excluding commas, spaces, perhaps REPLACE('The' WITH '')  etc. in the match condition ...

But does anyone know of a decent tool which can identify "SIMILAR" entries in a database? i.e. non-exact matches which still have a very high ranking when a comparison is made between the entry in a field?
Mon, Sep 22 2014 9:43 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam


Haven't we had this conversation before? Or was it someone else?

You've already spotted some of the obvious things to do. However, firstly, I suggest that you don't just match automatically where you don't have an exact match. Give the customer an option to check stuff out just in case you've made some wrong assumptions.

I think the tool you're looking for is along the lines of SOUNDEX

And just in case it isn't here goes for things I've used in the past in no particular order:

1: If lengths are different then truncate to shortest and match
2. remove ALL common words / abbreviations, plurals and punctuation eg Co, Company, Ltd, plc
3. Ignore names and compare phone numbers and / or addresses
4. Sort characters in the names into alpha order and compare that
5. As for 4 but eliminate duplicates
6. "phoneticise" them - similar to but different from SOUNDEX and you have to come up with your own algorithm which may be accent dependent Smiley

Roy Lambert
Mon, Sep 22 2014 11:10 AMPermanent Link

Adam Brett

Orixa Systems

Thanks for all the suggestions Roy

>>Haven't we had this conversation before? Or was it someone else?

It is actually possible I asked this before, we approached this problem about 3 years back & just backed away from it as it was too painful!
Image