The name of a man is a numbing blow from which he never recovers. ~Marshall McLuhan
In the first part of this series, we spoke of the difficulties of dealing with proper names, even though we only considered the case when names remained unchanged. Of course, we all know that people sometimes change their names.
In US and Canadian practice, the most common event signaling a change is marriage, and conventional practice is for the woman to drop her last name and adopt her husband’s. While this still happens, and is complicated enough to wreak havoc on data quality, the world isn’t nearly that simple any more. A survey of postings to alt.wedding reveals 8 additional variant naming conventions for when “Jane Smith” marries “Michael Brown”*:
- Wife Hyphenates The Two Names (Jane Smith becomes Jane Smith-Brown).
- Wife Uses Birth Name as Middle Name (Jane Smith becomes Jane Smith Brown, with no hyphenation).
- Husband and Wife Keep Their Own Birth Names (Jane Smith stays Jane Smith)
- Wife takes Husband’s Name Socially, Keeps Own Name Professionally (Jane Smith is Jane Smith at work, but Jane Brown otherwise)
- Husband takes Wife’s Name (Michael Brown becomes Michael Smith)
- Husband and Wife Both Hyphenate (Jane and Michael become The Smith-Browns)
- Husband and Wife take Each Other’s Names as Middle Names (Jane becomes Jane Brown Smith, Michael becomes Michael Smith Brown) – last names are still different, but there is the symbolism of having taken each other as part of themselves.
- Husband and Wife Pick a New Name
For a moment, consider the impact on data quality of all of these variations. Take one of the most common (#4) where a woman maintains her maiden name professionally. This means that you’re stuck with trying to synchronize data records for different names depending on whether Jane views a given relationship as professional or personal. It may be obvious to Jane which name to use where, but this is unlikely to be transparent to your business. A financial institution would need to deal with Jane Smith for “professional” credit cards and bank accounts, and Jane Brown for “personal” accounts. A media company might sell some products to Jane Smith, and others to Jane Brown. Not to mention the issues of long standing relationships (pre-marriage) which would have started as Jane Smith and now need to be transitioned, and linked with new ones that now start as Jane Brown (except for the records that need to stay Jane Smith).
Obviously, this one example is just scratching the surface. When you actually capture the changed name, your own staff can generate additional variations. Will you actually capture the hyphen, or perhaps add a hyphen that’s not supposed to exist? Will the middle name actually get fielded into the middle name field, or show up as one of two last names? Will you replace the old middle name with the new one, or add it?
The many variations here point out the advantage of the Netrics Matching Platform’s approach to data matching, which is to look holistically at the data record and look for similarities wherever they occur. Then you don’t care much at all if you’ve captured Jane | Smith | Brown or Jane | Smith Brown or Jane | Smith-Brown. It’s a powerful approach, made possible by the flexibility and computational simplicity of Netrics’ underlying approach.
You can also imagine the implication of these new naming variations on the complexity of rules-based systems. Imagine trying to sort out all 8 variations in a name matching rules-base. And we haven’t touched on non-US name changing conventions which can be quite different.
Finally, of course, there’s the issue that it’s not just marriage that causes people to change names. In any culture, someone can adopt a nickname (formally or informally), and this variation can leak into your corporate data. And there may be other reasons for name changes as well.
A few years ago, Netrics performed some data cleansing work for a hospital in Arizona. As our own data experts were performing QA testing on the resulting identified duplicates, we thought we had a big problem: the Netrics Decision Engine was identifying pairs of male records with the same first name and different last names as duplicates. We were very concerned until we called and spoke with our client. It turns out that when some Native American males leave the reservation, they adopt an “American” last name to use off the reservation. The Netrics Decision Engine, which creates its own mathematical model using Machine Learning had figured out this subtlety in the data – something that we did not know, and more importantly, something that we did not have to ask of our client’s data experts.
Imagine trying to handle that with a probabilistic rules based system. Moreover, how can one know all of these cases for all of the different data sources and all different data types – a priori!
Our mantra is: Learn, don’t guess!
*Thanks to M. Elizabeth Hunter and Sonja Kueppers on soc.couples.wedding for compiling this list.
