Data: Transparency vs. Quality?
In my previous post The Challenges of Data Transparency, I discussed the news about the data preparation for Recovery.gov regarding how state and local recipients are spending federal stimulus money.
In the post, I talked about the juxtaposition of data transparency and data quality, and how although missing or incomplete data is a common problem, completeness without any regard for accuracy could possibly do more harm than good.
I asked whether data should be concealed until it has been verified to be of sufficient quality, or should be provided as soon as it becomes available without regard for quality.
This past week, we have been inundated with news reports from numerous media outlets regarding the glaring data quality issues found on Recovery.gov, which would seem to indicate many would answer my question by advocating concealment until the verification of data quality has been performed.
I don’t want to get into some of the more politically charged aspects of the current debate. I would prefer to pose the question in a more general sense. When it comes to data, does it fundamentally come down to transparency vs. quality?
From my perspective, the underlying struggle in this debate is the desire to achieve both total data transparency and perfect data quality. As wonderful as it would be if this was possible, the reality is simply that it is not.
Perfection (especially in data) is impossible to achieve. Transparency reveals the quality issues naturally inherent in data. I am not advocating we simply accept the reality of poor quality. We must take action to identify and overcome data quality issues.
The traditional approach is employing standardization and other data cleansing techniques in an effort to perfect data. Continuing advancements in mathematics and machine learning algorithms provide the capability to adapt to (and overcome) data’s inherent imperfections.
We must strive for total data transparency balanced with a realistic perspective of data quality. Transparency provides the necessary access and emerging innovations in quality provide the methods for transforming data into actionable information.
Related Posts
The Chaos Theory of Data Quality
Tags: Economy, Technology
Posted in Economy, Technology | No Comments »
