What’s in a Name? (Part III)

Monday, June 8th, 2009 @ 3:27 pm

Names aren’t just for people.  We use them to describe organizations (i.e. businesses), brands, products, and many more things.  When it comes to data matching, they may be just as important as persons.  For example, if my goal is to match entries in a CRM for a B to B business, I’m going to care as much about the organization, department, and title as I will about the current incumbent’s name.  If the task is to consolidate purchasing, component names and part numbers may be more important than people names.  For this, and many more reasons, it’s important that my matching software work with a broad range of data entities, and not be locked into only the most common.

As a proxy for the kinds of challenges you might face in matching components, I thought it would be entertaining to look at the variations on the Internet for appliance descriptions.  It’s something that’s on my mind, as I’m in the market right now.  My wife loves the Fisher & Paykel line:  for those of you who aren’t in the appliance market, F&P is an Australian brand that has captured a pretty good share of the high-end appliance market in recent years.  My wife loves the “look”.

I did a Google search for Fisher & Paykel refrigerators, went onto the F&P site, copied the first two lines of the “official” description for the model we’re thinking of buying :

Fisher & Paykel E522BRXU 17.6 cu. Ft EZKleen Stainless Steel  (reference string)

Then went to a series of additional sites and found the closest model, copying the text I found in the first two lines of their listings.  Here’s what I found in the next 4 sites I checked.

  1. Fisher Paykel 17.6 Cu Ft ActiveSmart Stainless Flat Door Left Hinge Refrigerator With Ice And Water Dispenser – E522BLXFDU  (homeappliancecenter.com)
  2. Fisher & Paykel  17.6 Cu. Ft.  Bottom Mount Refrigerator (Color: Stainless) Item #:278747 Model:E522BRXFDU (Lowes.com)
  3. Fisher Paykel E522BRXU 17.6 cu. ft. Freestanding Bottom-Freezer Refrigerator with Active Smart System, Adjustable Glass Shelves, External Water Dispenser and Curved Door Design  (ajmadison.com)
  4. Fisher and Paykel E522BLXU (17.6 cu. ft.) Bottom Freezer Refrigerator (epinions.com)

For me, manually looking up each of these appliances, the variations I found were no problem.  But how well would your matching software perform on the same data set?  Note the variations:

  • 3 variations on brand:  “Fisher & Paykel” (correct),” Fisher Paykel”,” Fisher and Paykel”
  • 4 variations on model number:  E522BRXU, E522BLXU, E522BRXFDU, E522BLXFD.  These are all basically the same refrigerator, but the R vs. L in the model number denotes a left-hand versus right-hand door hinge, and the FD models have a slightly different door handle.
  • 5 different variations on “17.6 cubic foot”.  Surprisingly to me, this is the most consistent of any piece of information, but it appears 5 different ways:  cu. Ft / Cu Ft/Cu. Ft./cu. ft./(cu. ft.)
  • Note that “EZKleen Stainless Steel”, which is the dominant descriptor in the “official” Paykel listing, doesn’t appear in any of the others, though “Stainless” by itself appears in 2 additional descriptions
  • “ActiveSmart”, which appears in the 4th line of the official Paykel listing (and therefore wasn’t quoted here) appears in two descriptions, in two ways:  “Active Smart” and “ActiveSmart”.
  • “Bottom” appears in three descriptions (but not in the original) as:  “Bottom Mount”, “Bottom Freezer”, and “Bottom-Freezer”
  • “Dispenser” appears in two descriptions, in two variations: “Ice and Water Dispenser” / “External Water Dispenser”

Now, imagine if these descriptions weren’t off the Internet, but were component descriptions from 5 different assembly operations, and that these operations, collectively, purchased tens of thousands of different components.  Could your software match them automatically?

It’s one thing for software to come with a built in module customized for person-name matching.  But complex enterprises work with thousands of different data entities, any and all of which require matching.

Tags:
Posted in Data Matching | No Comments »

Leave a Reply