The Chaos Theory of Data Quality

November 2nd, 2009 by Stefanos Damianakis

“One of those issues that is always a source of frustration in the enterprise,” explained Michael Vizard in his recent IT Business Edge blog post, The Never Ending War for Data Quality, “is the quality of the data we spend so much time and money processing.  The quest to make sure we have high quality data is nothing short of a never-ending battle between the forces of order and the chaos that envelopes every attempt to organize anything.”

I have to admit that this is one of my pet peeves.  A remarkably common misconception is that the only way to deal with the pervasive nature of “imperfect data” is to somehow magically keep all of the data “perfect” all of the time.

Data frequently contains numerous variations caused by different conventions, lack of standards, omissions, and other inconsistencies.  The traditional approach to data quality is to heavily rely on standardization and other data cleansing efforts in order to prepare data before it can be effectively used for making business decisions.  These preparation activities attempt to create a consistent format of parsed attributes with standardized values.

“Alas, the war over data quality can never really be won,” explains Vizard.  “What can be done is that the number of instances where we have conflicting data and outright errors can be sharply reduced.  There’s no shame in having bad data; everybody does.  The only real sin is not trying to do anything about it.”

I agree with Vizard on the points that everybody has bad data and that we do need to do something about it.

However, the time is long overdue for us to stop depending on outdated approaches to data quality.

Perfection (especially in data) is impossible to achieve.  Intelligent business decisions can be made using imperfect data – without extensive data cleansing.  Instead of trying to make the data perfect, we need to focus on enabling enterprise applications to handle the unavoidable reality of imperfect data, which is something that humans do naturally.

Advancements in mathematics and machine learning algorithms provide the capability to adapt to (and overcome) data’s inherent chaos, and enable enterprises to make better data-driven business decisions.

I call this approach the Chaos Theory of Data Quality.

Related Posts

The Growing Importance of the Algorithm

The Growing Importance of Mathematics

Adaptive Software

Drowning in Imperfect Data

A Sisyphean Task…

Tags: ,
Posted in Technology, Trends | No Comments »

Leave a Reply

Pages

RSS Netrics HD

About Netrics HD

Data matching is a fundamental operation in many applications, from improving data quality to implementing master data management. Stef Damianakis, CEO of Netrics, a world leader in matching technology, shares his thoughts on the state of the technology and business of data matching.

Brought to you by...

Netrics Logo

Calendar

September 2010
M T W T F S S
« Nov    
 12345
6789101112
13141516171819
20212223242526
27282930  

Tag Cloud

Categories

Recent Posts

Recent Comments