Matches Created

August 10th, 2009 by Stefanos Damianakis

Bill James is a baseball writer, historian, and statistician, who is perhaps best known for pioneering the field of sabermetrics, which as he defined it is “the search for objective knowledge about baseball.”

James uses analysis of baseball statistics to evaluate the contribution of an individual baseball player’s performance to their team’s ability to win a game.  For hitters, he believed that “a hitter should be measured by his success in that which he is trying to do…create runs.”

To measure this, James created a new baseball statistic that he called Runs Created:

(Hits + Walks) x Total Bases / (At Bats + Walks)

At the heart of this formula is the premise that a player’s ability to get on base is crucial to their team’s ability to score runs and win games.

Although that may sound rather obvious, the formula’s emphasis on statistics not typically considered important (e.g. Walks) was antithetical to baseball’s “conventional wisdom.”

Traditionally, statistics such as Batting Average (Hits / At Bats) and RBI (runs batted in) were considered tried and true techniques for evaluating hitters.

Additionally, there were the “intangibles” observed by scouts and coaches who trusted their “gut” more than nerdy number crunching.

After all, as these experts would argue – baseball is played on a field, not on a calculator.

All of this was detailed in Moneyball: The Art of Winning an Unfair Game, the excellent 2003 book by Michael Lewis.

Matches Created

In data matching, where statistical properties of fields and their values are used to measure the contribution each field makes to the likelihood that a matching record has been found, success should also be measured by what we are trying to do…create matches.

Tried and true techniques continue to be sought for the complex challenge of creating matches, with many of these techniques coming from advanced mathematics.

When you look under the hood of some of these new approaches to data matching, you might find some fields and their statistical properties being used in ways antithetical to “conventional wisdom.”

Initially, your “gut” might tell you these approaches simply don’t sound like they could possibly create acceptable matches.

However, success is truly measured by evaluating the match results – not the data matching techniques.

In some ways, it brings to mind what the 19th century poet John Keats referred to as Negative Capability:

“Capable of being in uncertainties, mysteries, and doubts without any irritable reaching for fact and reason.”

Of course, Keats was advocating an open-mindedness to new concepts in literature and philosophy, where if something speaks to you of a truth that you could accept but not explain, why bother with trying to explain it?

Therefore, if a new approach to data matching creates matches that you can accept, does it really matter what algorithm was used?

Perhaps we should follow Bill James lead and create a new statistic called Matches Created?

Related Posts

Narrative Fallacy and Data Matching

Tags:
Posted in Data Matching | No Comments »

Leave a Reply

Pages

RSS Netrics HD

About Netrics HD

Data matching is a fundamental operation in many applications, from improving data quality to implementing master data management. Stef Damianakis, CEO of Netrics, a world leader in matching technology, shares his thoughts on the state of the technology and business of data matching.

Brought to you by...

Netrics Logo

Calendar

July 2010
M T W T F S S
« Nov    
 1234
567891011
12131415161718
19202122232425
262728293031  

Tag Cloud

Categories

Recent Posts

Recent Comments