Tag: Technology

Data Quality Enlightenment

November 30th, 2009

After years of neglect, data quality is slowly moving to the forefront of business technology as both a discipline and a thriving industry.

However, given data quality license revenues are estimated at a relatively minuscule $400 million for 2009 (compared to $17 billion for DBMS license revenues), data quality is not quite center stage yet.

Therefore, in this post I want to discuss the increase in awareness by organizations that is necessary to give data quality its due.  I describe it as the three levels of Data Quality Enlightenment (DQE).

DQE Level 1 – Unaware

Organizations at Level 1 are blissfully unaware that slight discrepancies in their data create the potential for their business processes to fail.

Sometimes, the resulting failure is immediately visible.  Other times, it eventually becomes visible in a downstream application or after some period of time has passed.  Either way, the organization feels the impact of the failure as increased costs or decreased revenue, or both.

Upon finally recognizing the root cause of the problem to be data quality, organizations typically progresses to Level 2.

DQE Level 2 – Aware

Organizations at Level 2 have come to realize that they must implement data quality measures to avoid the costs of “bad data.”

The logic usually goes like this – if data is not perfect our business processes can fail, therefore we must make sure that our data is always perfect.  What wonderfully flawed logic!

No matter how hard an organization tries, their data can never be perfect.  Why?  Because by its nature, the data, and access to it, changes over time.

Existing records are updated.  New records are created.  Both of these actions can be performed by existing or new people and by existing or new systems.

With the additional reality that these people and systems can be both internal and external to the organization, the complexity grows exponentially.

Therefore, is it realistic to expect that all data throughout the enterprise will always be kept perfect and standardized the exact same way?

Will humans accessing the data know and use the standard methods?  Will humans always know the exact and correct data they want?  Will multiple applications (within and between organizations) that need to share data use the same standards for data perfection?

Of course not.  Simply put, perpetually perfect data is not possible.  Don’t believe anyone who tells you otherwise.

Yet despite these facts, the majority of the data quality industry is still focused on attempting to achieve data perfection.

The common belief is that the way to data Utopia is by writing rules to parse, standardize and match data.  Of course the different rules have fancy technical names like “deterministic” and “probabilistic” but they all boil down to manual, static rules that need to be created, maintained, and updated in perpetuity.

The rules an organization has in place today for “perfect data” will have to change (update old rules and add new rules) as the data changes.

Unlike Level 1, where organizations quickly realize they must change and progress to Level 2, most organizations at Level 2 get stuck here and never progress to Level 3.

DQE Level 3 – Enlightened

Organizations reach Level 3 when they achieve enlightenment via the “eureka moment” when they realize that getting and keeping data perfect at all times and forever is, fundamentally, an insane idea.

These organizations then seek to find a better way.

That better way is to enable all enterprise applications to function correctly despite the fact that the underlying operational data they use is not perfect.  And to do it without constantly updating and creating rules to parse, standardize, and match data.

The enlightened phase has only just begun with a select few organizations reaching Level 3.

Enlightenment is Inevitable

As is often the case,  enlightenment comes from a simple yet powerful idea that breaks away from the constraints of conventional thought.

It’s only a matter of time before every enterprise application will no longer assume and require “perfect” data in order to function correctly.

When this finally happens, and it will, everyone will benefit.

Tags: , ,
Posted in Business, Innovation, Technology | No Comments »

Data: Transparency vs. Quality?

November 23rd, 2009

In my previous post The Challenges of Data Transparency, I discussed the news about the data preparation for Recovery.gov regarding how state and local recipients are spending federal stimulus money.

In the post, I talked about the juxtaposition of data transparency and data quality, and how although missing or incomplete data is a common problem, completeness without any regard for accuracy could possibly do more harm than good.

I asked whether data should be concealed until it has been verified to be of sufficient quality, or should be provided as soon as it becomes available without regard for quality.

This past week, we have been inundated with news reports from numerous media outlets regarding the glaring data quality issues found on Recovery.gov, which would seem to indicate many would answer my question by advocating concealment until the verification of data quality has been performed.

I don’t want to get into some of the more politically charged aspects of the current debate.  I would prefer to pose the question in a more general sense.  When it comes to data, does it fundamentally come down to transparency vs. quality?

From my perspective, the underlying struggle in this debate is the desire to achieve both total data transparency and perfect data quality.  As wonderful as it would be if this was possible, the reality is simply that it is not.

Perfection (especially in data) is impossible to achieve.  Transparency reveals the quality issues naturally inherent in data.  I am not advocating we simply accept the reality of poor quality.  We must take action to identify and overcome data quality issues.

The traditional approach is employing standardization and other data cleansing techniques in an effort to perfect data.  Continuing advancements in mathematics and machine learning algorithms provide the capability to adapt to (and overcome) data’s inherent imperfections.

We must strive for total data transparency balanced with a realistic perspective of data quality.  Transparency provides the necessary access and emerging innovations in quality provide the methods for transforming data into actionable information.

Related Posts

The Chaos Theory of Data Quality

Drowning in Imperfect Data

The Growing Importance of the Algorithm

The Growing Importance of Mathematics

Tags: ,
Posted in Economy, Technology | No Comments »

HITECH Challenges

November 16th, 2009

In his recent Internet Evolution article Stimulus Plan Moves Healthcare Tech Center Stage, John Soat reported on the challenges facing healthcare providers under the section of the United States federal stimulus bill known as the Health Information Technology for Economic and Clinical Health (HITECH) Act, which is intended to jumpstart the use of digital technology in the healthcare industry, in particular the use of e-health records.

The article mentions some of the excellent e-health technology efforts under way at General Electric, Google, and Microsoft.  It’s amazing how 20 billion dollars (at least) in government funding can lead some of the biggest companies to focus on building digital technology solutions for the healthcare industry, which is something long overdue and for far longer than the recent stimulus funding has become available.

One of the primary challenges of the e-health evolution is in the area of electronic health records (EHR).  Back in March 2009, I wrote an article for the Executive Healthcare Management (EHM) magazine about lessons from the bleeding edge of EHR.

The evolution of digitizing, storing, and successfully retrieving accurate information necessary for servicing customers has been well underway in other industries for decades.  The evolution of customer data management is continuing and still has challenges to overcome.

However, in the healthcare industry, the customer is primarily a patient and the service being provided is primarily medical treatment.  In many cases, retrieving accurate information can be a matter of life and death.

Duplicate customer data can undermine the effectiveness of sales and marketing programs, causing unnecessary costs and wasteful spending that greatly reduces revenue.

However, duplicates in a master patient index can cause incorrect or outdated information to be used as the basis for medical treatments.  These mistakes can incur costs of a human nature far greater and far more important than costs of a financial nature.

HITECH is indeed presenting the healthcare industry with significant challenges to overcome.  However, these challenges are not simply about modernizing the industry with the latest and greatest technology.

Healthcare is a great example of how innovative technology is fundamentally about improving the quality of human life.

Tags:
Posted in Technology | No Comments »

Service-Oriented is Future-Oriented

November 9th, 2009

In his recent ebizQ.net article SOA, Phase 2: Toward a Loosely Coupled World, Joe McKendrick declared:

“I am a passionate believer in the power of technology, as an enabler of entrepreneurship and organizational transformation. I have long advocated flattening the organizational hierarchy, and pushing decision-making down to the managers and employees who deal with customers and production on a day-to-day basis.”

I couldn’t agree more.  Nothing has a more powerful effect on an organization’s ability to succeed than putting the right technology into the hands of front line employees.

There is an unstoppable industry trend gaining daily momentum where organizations are increasingly looking for solutions with cloud computing and software-as-a-service (SaaS) as the new paradigm for enterprise architecture.

“Cloud computing is pushing some software vendors to change their models to component delivery,” explains McKendrick.  “This makes plenty of room not only for small start-ups, but also for development shops within traditional enterprises that have great ideas.”

Historically, many of the most powerful new trends in technology originated from small entrepreneurial vendors.  By focusing on enhancing their highly specialized components, they can provide a great source of rapid innovation.  Therefore, small software vendors, whose solutions are designed for deployment using a loosely coupled service-oriented architecture (SOA), may be the industry’s small giants upon whose broad shoulders we will all be standing in the not-to-distant future.

And according to Mohan Sawhney, professor at Northwestern’s Kellogg School of Management:

“The best-run companies are becoming orchestrators of networks of services.  Five years from now, the concept of an application will be obsolete.  They will all be services, combined, mixed, matched and reused as needed.”

Therefore, when it comes to enterprise architecture — service-oriented is future-oriented.

Related Posts

The API and the Innovation of Enterprise Applications

Innovation Recession?

Innovation – Do More with Less

The Cloud brings Commoditization

Tags: , ,
Posted in Innovation, Technology, Trends | No Comments »

The Chaos Theory of Data Quality

November 2nd, 2009

“One of those issues that is always a source of frustration in the enterprise,” explained Michael Vizard in his recent IT Business Edge blog post, The Never Ending War for Data Quality, “is the quality of the data we spend so much time and money processing.  The quest to make sure we have high quality data is nothing short of a never-ending battle between the forces of order and the chaos that envelopes every attempt to organize anything.”

I have to admit that this is one of my pet peeves.  A remarkably common misconception is that the only way to deal with the pervasive nature of “imperfect data” is to somehow magically keep all of the data “perfect” all of the time.

Data frequently contains numerous variations caused by different conventions, lack of standards, omissions, and other inconsistencies.  The traditional approach to data quality is to heavily rely on standardization and other data cleansing efforts in order to prepare data before it can be effectively used for making business decisions.  These preparation activities attempt to create a consistent format of parsed attributes with standardized values.

“Alas, the war over data quality can never really be won,” explains Vizard.  “What can be done is that the number of instances where we have conflicting data and outright errors can be sharply reduced.  There’s no shame in having bad data; everybody does.  The only real sin is not trying to do anything about it.”

I agree with Vizard on the points that everybody has bad data and that we do need to do something about it.

However, the time is long overdue for us to stop depending on outdated approaches to data quality.

Perfection (especially in data) is impossible to achieve.  Intelligent business decisions can be made using imperfect data – without extensive data cleansing.  Instead of trying to make the data perfect, we need to focus on enabling enterprise applications to handle the unavoidable reality of imperfect data, which is something that humans do naturally.

Advancements in mathematics and machine learning algorithms provide the capability to adapt to (and overcome) data’s inherent chaos, and enable enterprises to make better data-driven business decisions.

I call this approach the Chaos Theory of Data Quality.

Related Posts

The Growing Importance of the Algorithm

The Growing Importance of Mathematics

Adaptive Software

Drowning in Imperfect Data

A Sisyphean Task…

Tags: ,
Posted in Technology, Trends | No Comments »

The API and the Innovation of Enterprise Applications

October 26th, 2009

“One of the bigger trends to come down the pike lately,” explained Jim Ericson in his recent Information Management blog post The API is the New Network, “is the proliferation of Web-based application programming interfaces, or APIs, and how network traffic is growing exponentially through APIs.”

More and more organizations continue to look to innovations in cloud computing, software-as-a-service (SaaS), and information as a service, as a new paradigm for enterprise applications.  In a recent press release, Gartner Research identified the Top 10 Strategic Technologies for 2010 and the list includes both cloud computing and client computing.

This is an almost stark contrast to the traditional approach taken by large technology vendors, who tend to innovate via acquisition in order to offer consolidated enterprise application development platforms with seamlessly integrated components for data quality, data integration, master data management and business intelligence.  This allows the large technology vendors to offer end-to-end solutions and the convenience of one-vendor information technology shopping.

However, does buying everything from one large vendor guarantee a best of breed solution for each individual component?

An API-oriented approach enables a plug-and-play enterprise application strategy.  Under this model, enterprise applications are assembled from best of breed individual components that are loosely coupled via a network of API calls.

Historically, many of the most powerful new trends in technology originated from small entrepreneurial ventures.  Small technology vendors tend to be specialists with a narrow focus that can provide a great source of rapid innovation.

Perhaps we are witnessing the beginning of the reversal of the recent trend of vendor consolidation, and a return to the earlier industry landscape where smaller vendors remained focused on enhancing and improving their highly specialized components.

If the API is indeed the new network, then the innovation of enterprise applications is to be found in collaboration and not consolidation.

Related Posts

Innovation Recession?

Innovation – Do More with Less

The Cloud brings Commoditization

Tags: , ,
Posted in Innovation, Technology, Trends | No Comments »

MDM: “Golden” Repository or “Fool’s Gold”

October 19th, 2009

Master Data Management (MDM) is the logical extension of a 20 year evolution in data management practice.  The strategic goal for MDM is to provide a single, “golden” repository of mission-critical data that assures all systems, organizations, and users are getting consistent, accurate information to support their needs.

Today, a number of vendors are positioning themselves to take on this challenge with new technologies that purport to make MDM feasible.  Once implemented, MDM promises to maintain real-time, clean, and consistent 360° views of prospects, customers, and products.

However, in her recent IT World Canada article Data quality vendors missing the mark, Kathleen Lau reported on a study by Andy Hayler, President and CEO of the analyst firm The Information Difference that shows:

“The issue for lack of attention to data quality by MDM vendors is that traditionally these vendors have focused on building systems that digest data quickly, only to later realize such systems were useless if the data being input was bad.”

Amassing poor quality data would appear to be what many MDM “solutions” are actually delivering.  The technology behind many of these systems is powerful and their functionality is impressively robust.

However, simply assuming the underlying data is “good enough” to support the MDM system, will only transform a “golden” repository of mission-critical data into an enterprise database of “fool’s gold.”

Tags:
Posted in Technology | No Comments »

Data Sherpas Needed

October 12th, 2009

In the recent New York Times article Training to Climb an Everest of Digital Data, Ashlee Vance reported on the challenges associated with managing – and deriving value from – massive repositories of data.

“Researchers and workers in fields as diverse as bio-technology, astronomy and computer science,” reports Vance, “will soon find themselves overwhelmed with information.  The next generation of computer scientists has to think in terms of what could be described as Internet scale.  Facebook, for example, uses more than 1 petabyte of storage space to manage its users’ 40 billion photos.  (A petabyte is about 1,000 times as large as a terabyte, and could store about 500 billion pages of text).”

According to Gartner Research, the volume of enterprise data is doubling every 18 months.  This rapid data proliferation is causing day-to-day business challenges to evolve faster than the existing applications (or new applications under development) can react.

“Science these days has basically turned into a data-management problem,” said Jimmy Lin, an associate professor at the University of Maryland, at a recent technology conference.

From the beginning of civilization, mathematics (the language of science) has been central to our advancement.  But our relatively new found ability to collect massive amounts of digital data has ushered in a new era for leveraging and benefiting from mathematics.

Advancements in machine learning technology using sophisticated mathematical algorithms are providing the capability to not only rapidly process large volumes of data, but more importantly, enable enterprises to make better data-driven business decisions.

According to Vance, companies large and small, as well as universities and government agencies, are “looking for big data experts” capable of scaling today’s digital data mountains.

Perhaps tomorrow we will even see a listing in the classifieds (or more likely in a Twitter status update) that simply reads:

Data Sherpas Needed

Related Posts

The Growing Importance of Mathematics

Adaptive Software

Drowning in Imperfect Data

A Sisyphean Task…

Tags: , ,
Posted in Business, Technology, Trends | No Comments »

The Growing Importance of the Algorithm

September 21st, 2009

In his absolutely fantastic 2006 Princeton University essay The Algorithm: Idiom of Modern Science, Bernard Chazelle pondered the Holy Grail quest of computer science:

“How to unleash the full computing and modeling power of the Algorithm.”

Chazelle describes how Moore’s Law, which states that computing power doubles every two years, has delayed the rise to prominence of the algorithm, in much the same way that an abundance of relatively cheap oil has delayed the emergence of alternative energy sources.

The Triumph of Mathematics

“To make sense of the world, we have math,” explains Chazelle, and therefore, some might ask: Who needs algorithms?

“It is beyond dispute,” continues Chazelle, “that the dizzying success of 20th century science is, to a large degree, the triumph of mathematics.  A page’s worth of math formulas is enough to explain most of the physical phenomena around us: why things fly, fall, float, gravitate, radiate, blow up, etc.”

As Albert Einstein said:

“The most incomprehensible thing about the universe is that it is comprehensible.”

“Granted,” says Chazelle, “Einstein’s assurance that something is comprehensible might not necessarily reassure everyone, but all would agree that the universe speaks in one tongue and one tongue only: mathematics.”

The New Language of Science

“The Algorithm’s coming-of-age as the new language of science,” declares Chazelle, “promises to be the most disruptive scientific development since quantum mechanics.”

Algorithms are thought by some to be simply a way to automate the rapid execution of a task.  Although speed is important and the exponential growth of computing power has allowed algorithms to execute faster, it is the quality of the work performed by the algorithm that is vastly more important, especially algorithms used for complex data analysis in support of critical business decisions.

“The algorithmic paradigm,” explains Chazelle, “is not about what but how to think.  Self-reference is associated mostly with self-replication.  In the algorithmic world, by contrast, it is the engine powering the complex recursive designs that give abstraction its amazing richness: it is, in fact, the very essence of computing.  Should even a fraction of that power be harnessed for modeling purposes, there’s no telling what might happen.”

For example, using graph theory (a branch of theoretical mathematics), algorithms can construct mathematical models for the ways that humans recognize patterns in data.  The goal of these algorithms is not to replace human decision makers.

These algorithmically constructed models can be used to automate the rapid execution of analytical tasks providing true decision support for humans to use while navigating today’s challenging business environment, which faces daunting data volumes and a constantly evolving marketplace.

“Some say the Algorithm is poised to become the new New Math, the idiom of modern science,” explains Chazelle.  “I say The Sciences They Are A-Changin’ and the Algorithm is Here to Stay.  One thing is certain, Moore’s Law has put computing on the map: the Algorithm will now unleash its true potential.”

I completely agree and wholeheartedly echo the closing remark of Chazelle’s essay:

“May the Algorithm’s Force be with you.”

Related Posts

The Growing Importance of Mathematics

Drowning in Imperfect Data

Matches Created

A more precise, but less certain world

Narrative Fallacy and Data Matching

Tags: , ,
Posted in Innovation, Technology, Trends | No Comments »

The Growing Importance of Mathematics

September 14th, 2009

For speaking at this year’s Enterprise Data World conference, I received a copy of Stephen Baker’s amazing book The Numerati, which was inspired by his Jan 23, 2006  BusinessWeek article Math Will Rock Your World (one of my all time favorites!).

Why math will rock your world

“When it comes to producing data,” explains Baker, “we’re prolific.  The very air we breathe is teeming with motes of information.  People with the right smarts can summon meaning from the nearly bottomless sea of data.  The key to this process is to find similarities and patterns.  We humans do this instinctively.”

Humans Teach, Machines Learn

Advancements in machine learning technology using sophisticated mathematical algorithms are providing the capability to make better data-driven decisions.

“Learning machines swim in numbers,” explains Baker.  “The learning process starts with humans…the annotators.  Their work is…to teach the machine what we humans know at a glance.”

Therefore, these advancements are not an attempt to replace human knowledge workers.  The number crunching capabilities of these advancements will allow us to “gradually evolve from data serfs into data masters.”

Advanced Geometry

There are many mathematical disciplines involved in machine learning.  However, perhaps one of more surprising is advanced geometry.

“Scientists often describe the world of data as a domain of sharp angles, colliding planes, and vectors shooting along endless paths,” explains Baker.  “Imagine a vast multidimensional space [with] dozens of markers…each marker occupies its own patch of real estate.”

Imagine each marker representing an individual character within a string of text.  Machine learning using bipartite graphs to allow data to “produce a line – or vector – that intersects with each and every one of its own markers…it’s a little like those grade-school exercises where a child follows a series of numbers or letters with her pencil and ends up with a picture of a puppy or a Christmas tree,” explains Baker.

However, the picture that bipartite graphs are drawing are too complex for the three-dimensional world of the human imagination.

“The computer has no trouble depicting [data] as vectors,” continues Baker.  “They all run neatly from one dimension through countless others and, more important, through every one of their distinguishing markers.  [Data] that resemble each other, naturally enough, are neighbors in this vector space.  [Data] that have a lot in common tend to point at similar angles.  Each link shared is a line connecting them, a so-called edge.  The next step is to calculate the importance of each edge…[edges] given a higher score…those lines on the graph are thicker.”

A New Era of Applied Mathematics

“The information age that we’re in is going to be an emerging new era of what would be called applied mathematics,” concludes Baker.  “Mathematicians are going to dip into the sea of data to form…the mathematical modeling of humanity.”

From the beginning of civilization mathematics has been central to our advancement. It is after all the language of science. But our relatively new found ability to collect digital data has ushered in a new era for leveraging and benefiting from mathematics.

Related Posts

Drowning in Imperfect Data

Matches Created

A more precise, but less certain world

Narrative Fallacy and Data Matching

Tags: ,
Posted in Innovation, Technology | No Comments »

Previous page

Pages

RSS Netrics HD

About Netrics HD

Data matching is a fundamental operation in many applications, from improving data quality to implementing master data management. Stef Damianakis, CEO of Netrics, a world leader in matching technology, shares his thoughts on the state of the technology and business of data matching.

Brought to you by...

Netrics Logo

Calendar

March 2010
M T W T F S S
« Nov    
1234567
891011121314
15161718192021
22232425262728
293031  

Tag Cloud

Categories

Recent Posts

Recent Comments