A more precise, but less certain world

Monday, July 6th, 2009 @ 4:17 pm

I am reading the excellent book Super Crunchers by Ian Ayres, which has the great subtitle:

Why Thinking-By-Numbers is the New Way To Be Smart

“The heroic conception of expertise,” explains Ayres, “was that of an expert giving settled answers.  People are more likely to think of statistics as infinitely malleable and subject to manipulation.  This is a more precise, but less certain world.  The classical conception of probability is a world of absolutes.”

I couldn’t help but think of the classical approaches to data matching that rely largely on exact matching techniques to determine if two or more records should be linked, are duplicates, or represent the same entity.

“To the classicist, the probability of my currently having cancer is either 0 or 100 percent,” explains Ayres, “but we are all frequentists now.  Experts used to say Yes or No.  Now we have to contend with estimates and probabilities.”

I think that is exactly how many people feel about statistical data matching – they have to contend with estimates and probabilities.

Although potential matching records having a statistical probability less than 100 percent is less certain (than a 100% exact match), it is also more precise – because it tells you how reliable its prediction is by providing a confidence level greater than zero.

“This ability to report a confidence level in predictions underscores one of the most amazing things about the technique,” explains Ayres.  “If the prediction is imprecise (say because of poor or incomplete data), [the statistical technique] itself will be the first one to tell you not to rely on it.  When was the last time you heard a traditional expert [or a classical approach to data matching] tell you the precision of their estimate?”

I believe that when it comes to data matching, we all need to be more skeptical about certainty and more comfortable with precision – and to achieve this, we must continue the pursuit of innovation using mathematical techniques.

Related Posts

Matches Created

Narrative Fallacy and Data Matching

Apples and Oranges

Tags:
Posted in Data Matching | No Comments »

Leave a Reply