Archives – November, 2009
November 30th, 2009
After years of neglect, data quality is slowly moving to the forefront of business technology as both a discipline and a thriving industry.
However, given data quality license revenues are estimated at a relatively minuscule $400 million for 2009 (compared to $17 billion for DBMS license revenues), data quality is not quite center stage yet.
Therefore, in this post I want to discuss the increase in awareness by organizations that is necessary to give data quality its due. I describe it as the three levels of Data Quality Enlightenment (DQE).
DQE Level 1 – Unaware
Organizations at Level 1 are blissfully unaware that slight discrepancies in their data create the potential for their business processes to fail.
Sometimes, the resulting failure is immediately visible. Other times, it eventually becomes visible in a downstream application or after some period of time has passed. Either way, the organization feels the impact of the failure as increased costs or decreased revenue, or both.
Upon finally recognizing the root cause of the problem to be data quality, organizations typically progresses to Level 2.
DQE Level 2 – Aware
Organizations at Level 2 have come to realize that they must implement data quality measures to avoid the costs of “bad data.”
The logic usually goes like this – if data is not perfect our business processes can fail, therefore we must make sure that our data is always perfect. What wonderfully flawed logic!
No matter how hard an organization tries, their data can never be perfect. Why? Because by its nature, the data, and access to it, changes over time.
Existing records are updated. New records are created. Both of these actions can be performed by existing or new people and by existing or new systems.
With the additional reality that these people and systems can be both internal and external to the organization, the complexity grows exponentially.
Therefore, is it realistic to expect that all data throughout the enterprise will always be kept perfect and standardized the exact same way?
Will humans accessing the data know and use the standard methods? Will humans always know the exact and correct data they want? Will multiple applications (within and between organizations) that need to share data use the same standards for data perfection?
Of course not. Simply put, perpetually perfect data is not possible. Don’t believe anyone who tells you otherwise.
Yet despite these facts, the majority of the data quality industry is still focused on attempting to achieve data perfection.
The common belief is that the way to data Utopia is by writing rules to parse, standardize and match data. Of course the different rules have fancy technical names like “deterministic” and “probabilistic” but they all boil down to manual, static rules that need to be created, maintained, and updated in perpetuity.
The rules an organization has in place today for “perfect data” will have to change (update old rules and add new rules) as the data changes.
Unlike Level 1, where organizations quickly realize they must change and progress to Level 2, most organizations at Level 2 get stuck here and never progress to Level 3.
DQE Level 3 – Enlightened
Organizations reach Level 3 when they achieve enlightenment via the “eureka moment” when they realize that getting and keeping data perfect at all times and forever is, fundamentally, an insane idea.
These organizations then seek to find a better way.
That better way is to enable all enterprise applications to function correctly despite the fact that the underlying operational data they use is not perfect. And to do it without constantly updating and creating rules to parse, standardize, and match data.
The enlightened phase has only just begun with a select few organizations reaching Level 3.
Enlightenment is Inevitable
As is often the case, enlightenment comes from a simple yet powerful idea that breaks away from the constraints of conventional thought.
It’s only a matter of time before every enterprise application will no longer assume and require “perfect” data in order to function correctly.
When this finally happens, and it will, everyone will benefit.
Tags: Business, Innovation, Technology
Posted in Business, Innovation, Technology | No Comments »
November 23rd, 2009
In my previous post The Challenges of Data Transparency, I discussed the news about the data preparation for Recovery.gov regarding how state and local recipients are spending federal stimulus money.
In the post, I talked about the juxtaposition of data transparency and data quality, and how although missing or incomplete data is a common problem, completeness without any regard for accuracy could possibly do more harm than good.
I asked whether data should be concealed until it has been verified to be of sufficient quality, or should be provided as soon as it becomes available without regard for quality.
This past week, we have been inundated with news reports from numerous media outlets regarding the glaring data quality issues found on Recovery.gov, which would seem to indicate many would answer my question by advocating concealment until the verification of data quality has been performed.
I don’t want to get into some of the more politically charged aspects of the current debate. I would prefer to pose the question in a more general sense. When it comes to data, does it fundamentally come down to transparency vs. quality?
From my perspective, the underlying struggle in this debate is the desire to achieve both total data transparency and perfect data quality. As wonderful as it would be if this was possible, the reality is simply that it is not.
Perfection (especially in data) is impossible to achieve. Transparency reveals the quality issues naturally inherent in data. I am not advocating we simply accept the reality of poor quality. We must take action to identify and overcome data quality issues.
The traditional approach is employing standardization and other data cleansing techniques in an effort to perfect data. Continuing advancements in mathematics and machine learning algorithms provide the capability to adapt to (and overcome) data’s inherent imperfections.
We must strive for total data transparency balanced with a realistic perspective of data quality. Transparency provides the necessary access and emerging innovations in quality provide the methods for transforming data into actionable information.
Related Posts
The Chaos Theory of Data Quality
Drowning in Imperfect Data
The Growing Importance of the Algorithm
The Growing Importance of Mathematics
Tags: Economy, Technology
Posted in Economy, Technology | No Comments »
November 16th, 2009
In his recent Internet Evolution article Stimulus Plan Moves Healthcare Tech Center Stage, John Soat reported on the challenges facing healthcare providers under the section of the United States federal stimulus bill known as the Health Information Technology for Economic and Clinical Health (HITECH) Act, which is intended to jumpstart the use of digital technology in the healthcare industry, in particular the use of e-health records.
The article mentions some of the excellent e-health technology efforts under way at General Electric, Google, and Microsoft. It’s amazing how 20 billion dollars (at least) in government funding can lead some of the biggest companies to focus on building digital technology solutions for the healthcare industry, which is something long overdue and for far longer than the recent stimulus funding has become available.
One of the primary challenges of the e-health evolution is in the area of electronic health records (EHR). Back in March 2009, I wrote an article for the Executive Healthcare Management (EHM) magazine about lessons from the bleeding edge of EHR.
The evolution of digitizing, storing, and successfully retrieving accurate information necessary for servicing customers has been well underway in other industries for decades. The evolution of customer data management is continuing and still has challenges to overcome.
However, in the healthcare industry, the customer is primarily a patient and the service being provided is primarily medical treatment. In many cases, retrieving accurate information can be a matter of life and death.
Duplicate customer data can undermine the effectiveness of sales and marketing programs, causing unnecessary costs and wasteful spending that greatly reduces revenue.
However, duplicates in a master patient index can cause incorrect or outdated information to be used as the basis for medical treatments. These mistakes can incur costs of a human nature far greater and far more important than costs of a financial nature.
HITECH is indeed presenting the healthcare industry with significant challenges to overcome. However, these challenges are not simply about modernizing the industry with the latest and greatest technology.
Healthcare is a great example of how innovative technology is fundamentally about improving the quality of human life.
Tags: Technology
Posted in Technology | No Comments »
November 9th, 2009
In his recent ebizQ.net article SOA, Phase 2: Toward a Loosely Coupled World, Joe McKendrick declared:
“I am a passionate believer in the power of technology, as an enabler of entrepreneurship and organizational transformation. I have long advocated flattening the organizational hierarchy, and pushing decision-making down to the managers and employees who deal with customers and production on a day-to-day basis.”
I couldn’t agree more. Nothing has a more powerful effect on an organization’s ability to succeed than putting the right technology into the hands of front line employees.
There is an unstoppable industry trend gaining daily momentum where organizations are increasingly looking for solutions with cloud computing and software-as-a-service (SaaS) as the new paradigm for enterprise architecture.
“Cloud computing is pushing some software vendors to change their models to component delivery,” explains McKendrick. “This makes plenty of room not only for small start-ups, but also for development shops within traditional enterprises that have great ideas.”
Historically, many of the most powerful new trends in technology originated from small entrepreneurial vendors. By focusing on enhancing their highly specialized components, they can provide a great source of rapid innovation. Therefore, small software vendors, whose solutions are designed for deployment using a loosely coupled service-oriented architecture (SOA), may be the industry’s small giants upon whose broad shoulders we will all be standing in the not-to-distant future.
And according to Mohan Sawhney, professor at Northwestern’s Kellogg School of Management:
“The best-run companies are becoming orchestrators of networks of services. Five years from now, the concept of an application will be obsolete. They will all be services, combined, mixed, matched and reused as needed.”
Therefore, when it comes to enterprise architecture — service-oriented is future-oriented.
Related Posts
The API and the Innovation of Enterprise Applications
Innovation Recession?
Innovation – Do More with Less
The Cloud brings Commoditization
Tags: Innovation, Technology, Trends
Posted in Innovation, Technology, Trends | No Comments »
November 2nd, 2009
“One of those issues that is always a source of frustration in the enterprise,” explained Michael Vizard in his recent IT Business Edge blog post, The Never Ending War for Data Quality, “is the quality of the data we spend so much time and money processing. The quest to make sure we have high quality data is nothing short of a never-ending battle between the forces of order and the chaos that envelopes every attempt to organize anything.”
I have to admit that this is one of my pet peeves. A remarkably common misconception is that the only way to deal with the pervasive nature of “imperfect data” is to somehow magically keep all of the data “perfect” all of the time.
Data frequently contains numerous variations caused by different conventions, lack of standards, omissions, and other inconsistencies. The traditional approach to data quality is to heavily rely on standardization and other data cleansing efforts in order to prepare data before it can be effectively used for making business decisions. These preparation activities attempt to create a consistent format of parsed attributes with standardized values.
“Alas, the war over data quality can never really be won,” explains Vizard. “What can be done is that the number of instances where we have conflicting data and outright errors can be sharply reduced. There’s no shame in having bad data; everybody does. The only real sin is not trying to do anything about it.”
I agree with Vizard on the points that everybody has bad data and that we do need to do something about it.
However, the time is long overdue for us to stop depending on outdated approaches to data quality.
Perfection (especially in data) is impossible to achieve. Intelligent business decisions can be made using imperfect data – without extensive data cleansing. Instead of trying to make the data perfect, we need to focus on enabling enterprise applications to handle the unavoidable reality of imperfect data, which is something that humans do naturally.
Advancements in mathematics and machine learning algorithms provide the capability to adapt to (and overcome) data’s inherent chaos, and enable enterprises to make better data-driven business decisions.
I call this approach the Chaos Theory of Data Quality.
Related Posts
The Growing Importance of the Algorithm
The Growing Importance of Mathematics
Adaptive Software
Drowning in Imperfect Data
A Sisyphean Task…
Tags: Technology, Trends
Posted in Technology, Trends | No Comments »