<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Netrics HD</title>
	<atom:link href="http://www.netrics.com/blog/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.netrics.com/blog</link>
	<description>A High Definition View of the Business and Technology of Data Matching</description>
	<lastBuildDate>Fri, 12 Jun 2009 21:14:19 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Narrative Fallacy and Data Matching by Henrik Liliendahl Sørensen</title>
		<link>http://www.netrics.com/blog/narrative-fallacy-and-data-matching/comment-page-1/#comment-135</link>
		<dc:creator>Henrik Liliendahl Sørensen</dc:creator>
		<pubDate>Fri, 12 Jun 2009 21:14:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/?p=117#comment-135</guid>
		<description>I have worked with these different approaches:

Synonyms: This is in my eyes the most basic approach. You have a list of common translations to different words like common misspellings, nicknames and so on. This approach is of course very depending on heavy maintenance, and must be worked over for every language/country – and actually works better with English than other languages like the Germanic ones, where you use concatenated words (like ‘Main Street’ being ‘Mainstreet’). 

Match codes: You find those from very simple ones to the more sophisticated ones – going from ignoring vowels, soundex and metaphone (for English) to proprietary findings of all kinds. In my eyes match codes works OK for selecting candidates for matching – but falls a bit short when coming to actually settling the case.

Algorithms: A complex algorithm is a more sophisticated way to settle if two different spelled records make up the same real world entity. You have to deal with truncations, non phonetic typos, rearranged words and letters and all that jazz. The “LevenshteinDistance” is an example of an algorithm you could use – but such a method is just a fraction compared to the commercial used algorithms around. 

Probabilistic learning: This is if fact a variation of synonyms, but the collection is not based on up front maintenance but collection of users actual decisions when verifying automatic matching. The tool will register the frequency and context of the paired elements in the decisions. This of course requires a substantial collection. I have implemented such a feature at organisations, where several people every day do verify matching results. 

And then parsing and standardisation is often supplementary methods used to improve the matching. Also bringing in more data to support the decision is in my eyes a key to actually settle if some records make up the same real world entity. Business and consumer/citizen directories are available in different forms, coverage and depth around the world.</description>
		<content:encoded><![CDATA[<p>I have worked with these different approaches:</p>
<p>Synonyms: This is in my eyes the most basic approach. You have a list of common translations to different words like common misspellings, nicknames and so on. This approach is of course very depending on heavy maintenance, and must be worked over for every language/country – and actually works better with English than other languages like the Germanic ones, where you use concatenated words (like ‘Main Street’ being ‘Mainstreet’). </p>
<p>Match codes: You find those from very simple ones to the more sophisticated ones – going from ignoring vowels, soundex and metaphone (for English) to proprietary findings of all kinds. In my eyes match codes works OK for selecting candidates for matching – but falls a bit short when coming to actually settling the case.</p>
<p>Algorithms: A complex algorithm is a more sophisticated way to settle if two different spelled records make up the same real world entity. You have to deal with truncations, non phonetic typos, rearranged words and letters and all that jazz. The “LevenshteinDistance” is an example of an algorithm you could use – but such a method is just a fraction compared to the commercial used algorithms around. </p>
<p>Probabilistic learning: This is if fact a variation of synonyms, but the collection is not based on up front maintenance but collection of users actual decisions when verifying automatic matching. The tool will register the frequency and context of the paired elements in the decisions. This of course requires a substantial collection. I have implemented such a feature at organisations, where several people every day do verify matching results. </p>
<p>And then parsing and standardisation is often supplementary methods used to improve the matching. Also bringing in more data to support the decision is in my eyes a key to actually settle if some records make up the same real world entity. Business and consumer/citizen directories are available in different forms, coverage and depth around the world.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What&#8217;s in a Name? by Isaac W.</title>
		<link>http://www.netrics.com/blog/whats-in-a-name/comment-page-1/#comment-93</link>
		<dc:creator>Isaac W.</dc:creator>
		<pubDate>Thu, 28 May 2009 19:07:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/?p=131#comment-93</guid>
		<description>There&#039;s a lot of flack in the news recently from the ACLU and others about how inaccurate the Terrorist watch list is.  How many of those guys are named some variation of Mohammad?  With 18 different variations, no wonder it&#039;s a mess.</description>
		<content:encoded><![CDATA[<p>There&#8217;s a lot of flack in the news recently from the ACLU and others about how inaccurate the Terrorist watch list is.  How many of those guys are named some variation of Mohammad?  With 18 different variations, no wonder it&#8217;s a mess.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on From IT to BT (Business Technology) by Gary Palmer</title>
		<link>http://www.netrics.com/blog/from-it-to-bt-business-technology/comment-page-1/#comment-80</link>
		<dc:creator>Gary Palmer</dc:creator>
		<pubDate>Tue, 19 May 2009 08:52:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/?p=100#comment-80</guid>
		<description>Absolutely agree that the world of geeks and nerds is giving way to  a world of hybrids, happy dealing both with technology and commercial issues.

We are still left with the problem that IT colonised the word information early while actually focussing on the technology. The result is that to this day many people, and particularly many senior people in organisations confuse managing the technology with managing information

As the technology matures still further, and the technical wizards increasingly make the &quot;difficulties and intricacies&quot; you speak of ever more hidden from the business, I believe we will see the rise of true Information Management functions in all successful organisations

The issue will not be of IT becoming BT (a scary thought for us Brits where BT stands for something rather different!) but of Technology functions coming to work hand in hand with Information Management to realise a true Information Age</description>
		<content:encoded><![CDATA[<p>Absolutely agree that the world of geeks and nerds is giving way to  a world of hybrids, happy dealing both with technology and commercial issues.</p>
<p>We are still left with the problem that IT colonised the word information early while actually focussing on the technology. The result is that to this day many people, and particularly many senior people in organisations confuse managing the technology with managing information</p>
<p>As the technology matures still further, and the technical wizards increasingly make the &#8220;difficulties and intricacies&#8221; you speak of ever more hidden from the business, I believe we will see the rise of true Information Management functions in all successful organisations</p>
<p>The issue will not be of IT becoming BT (a scary thought for us Brits where BT stands for something rather different!) but of Technology functions coming to work hand in hand with Information Management to realise a true Information Age</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When is Baltimore, MD = Hamilton, NJ? by Steve Sarsfield</title>
		<link>http://www.netrics.com/blog/when-is-baltimore-md-hamilton-nj/comment-page-1/#comment-78</link>
		<dc:creator>Steve Sarsfield</dc:creator>
		<pubDate>Sat, 16 May 2009 03:18:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/?p=90#comment-78</guid>
		<description>Had a similar thing happen with the cable company.  Since I live in one town, but have a telephone exchange number (first three digits after the area code) from the next town over, it was nearly impossible to reach the right customer service center when calling from my home number.  It usually took a few transfers to get to the right place.   Good intentions with poor results.</description>
		<content:encoded><![CDATA[<p>Had a similar thing happen with the cable company.  Since I live in one town, but have a telephone exchange number (first three digits after the area code) from the next town over, it was nearly impossible to reach the right customer service center when calling from my home number.  It usually took a few transfers to get to the right place.   Good intentions with poor results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on When is Baltimore, MD = Hamilton, NJ? by Dylan Jones</title>
		<link>http://www.netrics.com/blog/when-is-baltimore-md-hamilton-nj/comment-page-1/#comment-76</link>
		<dc:creator>Dylan Jones</dc:creator>
		<pubDate>Fri, 15 May 2009 15:39:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/?p=90#comment-76</guid>
		<description>Nice post, let&#039;s pray our ambulance services don&#039;t go too far down the same path!</description>
		<content:encoded><![CDATA[<p>Nice post, let&#8217;s pray our ambulance services don&#8217;t go too far down the same path!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Monk Factor by Database Management &#187; Blog Archive &#187; The Monk Factor</title>
		<link>http://www.netrics.com/blog/the-monk-factor/comment-page-1/#comment-6</link>
		<dc:creator>Database Management &#187; Blog Archive &#187; The Monk Factor</dc:creator>
		<pubDate>Wed, 21 May 2008 03:50:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/index.php/2008/05/20/the-monk-factor/#comment-6</guid>
		<description>[...] Pattishub wrote an interesting post today onHere&#8217;s a quick excerpt I came across this recent article in BtoB Magazine BtoB Magazine:“Marketers: Clean customer data a priority in 2008” By Carol Krol March 17, 2008 It’s great to see that simply collecting customer data is no longer good enough - the data needs to be usable to benefit the business. Refining customer data quality and access to customer data have emerged as two of the top marketing investment priorities of b-to-b CMOs this year. Half of b-to-b marketers plan to put more resources against cre [...]</description>
		<content:encoded><![CDATA[<p>[...] Pattishub wrote an interesting post today onHere&#8217;s a quick excerpt I came across this recent article in BtoB Magazine BtoB Magazine:“Marketers: Clean customer data a priority in 2008” By Carol Krol March 17, 2008 It’s great to see that simply collecting customer data is no longer good enough &#8211; the data needs to be usable to benefit the business. Refining customer data quality and access to customer data have emerged as two of the top marketing investment priorities of b-to-b CMOs this year. Half of b-to-b marketers plan to put more resources against cre [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Yes, Business AND Technology by Stef</title>
		<link>http://www.netrics.com/blog/yes-business-and-technology/comment-page-1/#comment-3</link>
		<dc:creator>Stef</dc:creator>
		<pubDate>Sat, 03 May 2008 02:32:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/index.php/2008/04/18/yes-business-and-technology/#comment-3</guid>
		<description>Great point Mr. C. - you cut right to the chase. How important is matching - and why is it so often ignored... to the detriment (and expense) of many businesses and organizations.

I&#039;ll need a dedicated blog posting to provide some background and offer a complete and unbiased answer. 

But in the mean time, here is the condensed version: traditional matching rules are not good enough - even when you add probabilistic components to them. So, there&#039;s demand for innovation - and that innovation is mathematical modeling.</description>
		<content:encoded><![CDATA[<p>Great point Mr. C. &#8211; you cut right to the chase. How important is matching &#8211; and why is it so often ignored&#8230; to the detriment (and expense) of many businesses and organizations.</p>
<p>I&#8217;ll need a dedicated blog posting to provide some background and offer a complete and unbiased answer. </p>
<p>But in the mean time, here is the condensed version: traditional matching rules are not good enough &#8211; even when you add probabilistic components to them. So, there&#8217;s demand for innovation &#8211; and that innovation is mathematical modeling.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Yes, Business AND Technology by matchmeister</title>
		<link>http://www.netrics.com/blog/yes-business-and-technology/comment-page-1/#comment-2</link>
		<dc:creator>matchmeister</dc:creator>
		<pubDate>Fri, 02 May 2008 00:50:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/index.php/2008/04/18/yes-business-and-technology/#comment-2</guid>
		<description>So a major part of the problem is that so few realise how important the ability to match accurately is. Corp databases are full of errors and we need to be able to deal with those error - the question is how best to do this...

Any ideas?</description>
		<content:encoded><![CDATA[<p>So a major part of the problem is that so few realise how important the ability to match accurately is. Corp databases are full of errors and we need to be able to deal with those error &#8211; the question is how best to do this&#8230;</p>
<p>Any ideas?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Yes, Business AND Technology by vprovaznik</title>
		<link>http://www.netrics.com/blog/yes-business-and-technology/comment-page-1/#comment-1</link>
		<dc:creator>vprovaznik</dc:creator>
		<pubDate>Wed, 23 Apr 2008 07:35:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.netrics.com/blog/index.php/2008/04/18/yes-business-and-technology/#comment-1</guid>
		<description>Exactly the right role, engaged, motivated, smart and clever interpreter, moderator... enable business do the business and optimize the IT services to the business needs</description>
		<content:encoded><![CDATA[<p>Exactly the right role, engaged, motivated, smart and clever interpreter, moderator&#8230; enable business do the business and optimize the IT services to the business needs</p>
]]></content:encoded>
	</item>
</channel>
</rss>
