Category: Information Quality

  • St. Patrick’s Day Special

    image with bottled water being passed through a kettle and into a sink to give hot water

    I found this on http://www.motivatedphotos.com and it struck me that it is a wonderful metaphor for data integration, information quality, and data governance in many organisations where they are reacting to issues, sustaining silos, or viewing all of this as an IT issue rather than a business challenge, or trying to solve the challenge with series of fragmented department level initiatives.

    Thoughts?

  • Valentines Data Quality Post

    I’ve been inspired by Jim Harris’ excellent post about how companies need to love their data this Valentines Day, where he uses 1980s song lyrics to argue his case.  My personal view is that the 1980s, with a few exceptions, were a lost decade for music. So I trawled through my ipod and found this great song about a CEO’s tortured love for information.

    I give you “DATA” by Derek and the Dominoes.

    What will you do when you get data

    Loaded into your new BI?

    You’ve been running reports

    that don’t make sense for too long

    but you can’t blame your poor BI.

    Data, you’ve got me on my knees

    Data, I’m begging darlin’ please.

    Data, darlin’ won’t you ease my worried mind.

    I tried to get some information.

    But the data lets me down.

    Like a fool, I fell in love with you,

    But the duff  data turns my whole world upside down

    Data, you’ve got me on my knees

    Data, I’m begging darlin’ please.

    Data, darlin’ won’t you ease my worried mind.

    Let’s make the best of the information.

    Before I finally go insane

    please don’t say we’ll never find a way

    or tell me that all BI’s in vain

    Data, you’ve got me on my knees

    Data, I’m begging darlin’ please.

    Data, darlin’ won’t you ease my worried mind.

    Of course, if we look further into the archives we can find references to poor quality information dotted through the master works of the blues greats.

    • BB King’s under rated “The referential integrity’s gone”, later rereleased as “The Thrill is gone”
    • John Lee Hooker’s “I’d got my data workin’ (but it just don’t work on you)”, a song about a failed data migration later reworked and re-released as “I’ve got my Mojo workin’”.
    • Robert Johnson’s lost recording “I’ve got Data on my mind”.
    • The Blues Brothers “Everybody needs some data (to love)”.

    Even older than that, a 7 year old Wolfgang Amadeus Mozart wrote the timeless classic “Twinkle Twinkle little infomraiton record, how I wonder how complete and consistent you are“. Unfortunately his father made him rewrite it as a childish ditty about the stars. Astronomy’s gain was our loss.

    The list is endless, proving that the struggle with quality information to drive business value is as timeless as good music.

  • Personal Data – an Asset we hold on Trust

    There has been a bit of a scandal in Ireland with the discovery that Temple St Children’s Hospital has been retaining blood samples from children indefinitely without the consent of parents.

    The story broke in the Sunday Times just after Christmas and has been picked up as a discussion point on sites such as Boards.ie.  TJ McIntyre has also written about some of the legal issues raised by this.

    Ultimately, at the heart of the issue is a fundamental issue of Data Protection Compliance and a failure to treat Personal Data (and Sensitive Personal Data at that) as an asset (something of value) that the Hospital held and holds on trust for the data subject. It is not the Hospital’s data. It is not the HSE’s data. It is my child’s data, and (as I’m of a certain age) probably my data and my wife’s data and my brothers’ data and my sisters-in-laws’ data…..

    It’s of particular interest to me as I’m in the process of finishing off a tutorial course on Data Protection and Information Quality for a series of conferences at the end of February (if you are interested in coming, use the discount code “EARLYBIRD” up to the end of January to get a whopper of a discount). So many of the issues that this raises are to the front of my mind.

    Rather than simply write another post about Data Protection issues, I’m going to approach this from the perspective of Information as an Asset which has a readily definable Life Cycle at various points in which key decisions should be taken by responsible and accountable people to ensure that the asset continues to have value.

    Another aspect of how I’m going to discuss this is that, after over a decade working in Information Quality and Governance, I am a firm believer in the mantra: “Just because you can doesn’t mean you should“. I’m going to show how an Asset Life Cycle perspective can help you develop some robust structures to ensure your data is of high quality and you are less likely to fall foul of Data Protection issues.

    And for anyone who thinks that Data Protection and Data Quality are unrelated issues, I direct you to the specific wording in the heading of Chapter 2, Section 1 of the Directive 95/46/EC. (more…)

  • Who then is my customer?

    Two weeks ago I had the privilege of taking part in the IAIDQ’s Ask the Expert Webinar for World Quality Day (or as it will now be know, World Information Quality Day).

    The general format of the event was that a few of the IAIDQ Directors shared stories from their personal experiences or professional insights and extrapolated out what the landscape might be like in 2014 (the 10th anniversary of the IAIDQ).

    A key factor in all of the stories that were shared was the need to focus on the needs of your information customer, and the fact that the information customer may not be the person who you think they are. More often than not, failing to consider the needs of your information customers can result in outcomes that are significantly below expectations.

    One of my favourite legal maxims is Lord Atkin’s definition of who your ‘neighbour’ is who you owe legal duties of care to. He describes your ‘neighbour’ as being anyone who you should reasonably have in your mind when undertaking any action, or deciding not to take any action. While this defines a ‘neighbour’ from the point of view of litigation, I think it is also a very good definition of your “customer” in any process.

    Recently I had the misfortune to witness first hand what happens when one part of an organisation institutes a change in a process without ensuring that the people who they should have reasonably had in their mind when instituting the change were aware that the change was coming.

    My wife had a surgical procedure and a drain was inserted for a few days. After about 2 days, the drain was full and needed to be changed. The nurses on the ward couldn’t figure out how to change my wife’s drain because the drain that had been inserted was a new type which the surgical teams had elected to go with but which the ward nurses had never seen before.

    For a further full day my wife suffered the indignity of various medical staff attempting to figure out how to change the drain.

    1. There was no replacement drain of that type available on the ward. The connections were incompatible with the standard drain that was readily available to staff on the ward and which they were familiar with.
    2. When a replacement drain was sourced and fitted, no-one could figure out how to actually activate the magic vacuum function of it that made it work. The instructions on the device itself were incomplete.

    When the mystery of the drain fitting was eventually solved, the puzzle of how to actually read the amount of fluid being drained presented itself, which was only of importance as the surgeon had left instructions that the drain was to be removed once the output had dropped below a certain amount. The device itself presented misleading information, appearing to be filled to one level but when emptied out in fact containing a lesser amount (an information presentation quality problem one might say).

    The impacts of all this were:

    • A distressed and disturbed patient increasingly worried about the quality of care she was receiving.
    • Wasted time and resources pulling medical staff from other duties to try and solve the mystery of the drain
    • A very peeved and increasingly irate quality management blogger growing more annoyed at the whole situation.
    • Medical staff feeling and looking incompetent in front of a patient (and the patient’s family)

    Eventually the issues were sorted out and the drain was removed, but the outcome was a decidedly sub-optimal one for all involved. And it could have been easily avoided had there been proper communication about the change to the ward nurses and the doctors in the department from the surgical teams when they changed their standard. Had the surgical teams asked the question of who should they have in their minds to communicate with when taking an action, surely the post-op nurses should have featured in there somewhere?

    I would be tempted to say “silly Health Service” if I hadn’t seen exactly this type of scenario play out in day to day operations and flagship IT projects during the course of my career. Whether it is changing the format of a spreadsheet report so it can’t be loaded into a database or filtered, changing a reporting standard, changing meta-data or reference data, or changing process steps, each of these can result in poor quality information outcomes and irate information customers.

    So, while information quality is defined from the perspective of your information customers, you should take the time to step back and ask yourself who those information customers actually are before making changes that impact on the downstream ability of those customers to meet the needs of their customers.

  • Bank of Ireland – again

    The Irish Times today reports that Bank of Ireland are again investigating incidents of double charging of customers who use LASER cards.

    I wrote about this last month (see the archives here), picking up on a post from Tuppenceworth.ie earlier in the summer. I won’t be writing anything more about the issue (at least not for now).

    Looking back through my archives I found the picture below in a post that I’d written back in May when Simon on Tuppenceworth first raised his issue with BOI’s Laser Cards.

  • What’s in a name?

    Mrs DoBlog and I are anxiously awaiting the arrival of a mini-DoBlog any day now. So we have spent some time flicking through baby name books seeking inspiration for a name other than DoBlog 2.0.

    In doing so I have been yet again reminded of the challenges faced by information quality professionals when trying to unpick a concatenated string of text in a field that is labelled “Name”. The challenges are manifold:

    • Name formats differ from  to culture to culture – and it is not a Western/Asian divide as some people might assume at first.
    • Master Data for name spellings is notoriously difficult to obtain. My wife and I compared spellings of some common names in two books of baby names and the variations were staggering, with a number of spellings we are very familiar with (including my own name) not listed in either.
    • Often Family Names (surnames) can be used as Given Names (first names) such as Darcy (D’Arcy) or Jackson (Jackson) or Casey.
    • Often people pick names for their children based on where they were born or where they were conceived (Brooklyn Beckham, the son of footballer David Beckham is a good example).
    • Non-name words can appear in names, such as “Meat Loaf” or “Bear Grylls
    • Douglas Adams famously named a character in the Hitchhiker’s Guide to the Galaxy after one of the “dominant life forms” – a car called a “Ford Prefect
    • Names don’t always fit into an assumed varchar(30) or even varchar(100) field.
    • It is possible to have a one character Given name and a one character Family name.
    • Two character Family names are more common than we think.
    • Unicode characters, hyphens, spaces, apostrophes are all VALID in names – particularly if they are diacritical marks which change the meaning of words in particular languages.
    • And then you have people who change their names to silly things to be “different” or “special”,  but who create interesting statistical challenges for data profilers and parsing tools.

    Among the examples I found flicking through one of our baby name books last evening where “Alpha” and “Beta”. Personally I think it sends the wrong signals to name your children after letters of the Greek alphabet, but I’m sure it is helpful if you have had twins to keep them in order.

    I also found “Bairn” given as a Scots Gaelic name for a baby girl. I had to laugh at this as “Bairn” is actually a Scots dialect word for Child. Even Wikipedia recognises this and has a redirect from “Bairn” to “child“.  But it does remind me of the terribly sexist “joke” where the father asks the doctor after the birth whether it is a boy or a child his wife has just delivered. (more…)

  • A game changer – Ferguson v British Gas

    Back in April I wrote an article for the IAIDQ’s Quarterly Member Newsletter picking up on my niche theme, Common Law liability for poor quality information – in other words, the likelihood that poor quality information and poor quality information management practices will result in your organisation (or you personally) being sued.

    I’ve written and presented on this theme many times over the past few years and it always struck me how people started off being in the “that’s too theoretical” camp but by the time I (and occasionally my speaking/writing partner on this stuff, Mr Fergal Crehan) had finished people were all but phoning their company lawyers to have a chat.

    To an extent, I have to admit that in the early days much of this was theoretical, taking precedents from other areas of law and trying to figure out how they fit together in an Information Quality context. However, in January 2009 a case was heard in the Court of Appeal in England and Wales which has significant implications for the Information Quality profession and which has had almost no coverage (other than coverage via the IAIDQ and myself). My legal colleagues describe it as “ground breaking” for the profession because of the simple legal principle it creates regarding complex and silo’d computing environments and the impact of disparate and plain crummy data. I see it as a clear rallying cry that makes it crystal clear that poor information quality will get you sued.

    Recent reports (here and here) and anecdotal evidence suggest that in the current economic climate, the risk to companies of litigation is increasing. Simply put, the issues that might have been brushed aside or resolved amicably in the past are now life and death issues, at least in the commercial sense. As a result there is now a trend to “lawyer up” at the first sign of trouble. This trend is likely to accelerate in the context of issues involving information, and I suspect, particularly in financial services.

    A recent article in the Commercial Litigation Journal (Frisby & Morrison, 2008) supports this supposition. In that article, the authors conclude:

    “History has shown that during previous downturns in market conditions, litigation has been a source of increased activity in law firms as businesses fight to hold onto what they have or utilise it as a cashflow tool to avoid paying money out.”

    The Case that (should have) shook the Information Quality world

    The case of Ferguson v British Gas was started by Ms. Ferguson, a former customer of British Gas who had transferred to a new supplier but to whom British Gas continued to send invoices and letters with threats to cut off her supply, start legal proceedings, and report her to credit rating agencies.

    Ms Ferguson complained and received assurances that this would stop but the correspondence continued. Ms Ferguson then sued British Gas for harassment.

    Among the defences put forward by British Gas were the arguments that:

    (a) correspondence generated by automated systems did not amount to harassment, and (b) for the conduct to amount to harassment, Ms Ferguson would have to show that the company had “actual knowledge” that its behaviour was harassment.

    The Court of Appeal dismissed both these arguments. Lord Justice Breen, one of the judges on the panel for this appeal, ruled that:

    “It is clear from this case that a corporation, large or small, can be responsible for harassment and can’t rely on the argument that there is no ‘controlling mind’ in the company and that the left hand didn’t know what the right hand was doing,” he said.

    Lord Justice Jacob, in delivering the ruling of the Court, dismissed the automated systems argument by saying:

    “[British Gas] also made the point that the correspondence was computer generated and so, for some reason which I do not really follow, Ms. Ferguson should not have taken it as seriously as if it had come from an individual. But real people are responsible for programming and entering material into the computer. It is British Gas’s system which, at the very least, allowed the impugned conduct to happen.”

    So what does this mean?

    In this ruling, the Court of Appeal for England and Wales has effectively indicated a judicial dismissal of a ‘silo’ view of the organization when a company is being sued. The courts attribute to the company the full knowledge it ought to have had if the left hand knew what the right hand was doing. Any future defence argument grounded on the silo nature of organizations will likely fail. If the company will not break down barriers to ensure that its conduct meets the reasonable expectations of its customers, the courts will do it for them.

    Secondly, the Court clearly had little time or patience for the argument that correspondence generated by a computer was any less weighty or worrisome than a letter written by a human being. Lord Justice Jacob’s statement places the emphasis on the people who program the computer and the people who enter the information. The faulty ‘system’ he refers to includes more than just the computer system; arguably, it also encompasses the human factors in the systemic management of the core processes of British Gas.

    Thirdly, the Court noted that perfectly good and inexpensive avenues to remedy in this type of case exist through the UK’s Trading Standards regulations. Thus from a risk management perspective, the probability of a company being prosecuted for this type of error will increase.

    British Gas settled with Ms Ferguson for an undisclosed amount and was ordered to pay her costs.

    What does it mean from an Information Quality perspective?

    From an Information Quality perspective, this case clearly shows the legal risks that arise from (a) disconnected and siloed systems, and (b) inconsistencies between the facts about real world entities that are contained in these systems.

    It would appear that the debt recovery systems in British Gas were not updated with correct customer account balances (amongst other potential issues).

    Ms. Ferguson was told repeatedly by one part of British Gas that the situation was resolved, while another part of British Gas rolled forward with threats of litigation. The root cause here would appear to be an incomplete or inaccurate record or a failure of British Gas’ systems. The Court’s judgment implies that that poor quality data isn’t a defence against litigation.

    The ruling’s emphasis on the importance of people in the management of information, in terms of programming computers (which can be interpreted to include the IT tasks involved in designing and developing systems) and inputting data (which can be interpreted as defining the data that the business uses, and managing the processes that create, maintain, and apply that data) is likewise significant.

    Clearly, an effective information quality strategy and culture, implemented through people and systems, could have avoided the customer service disaster and litigation that this case represents.  The court held the company accountable for not breaking down barriers between departments and systems so that the left-hand of the organization knows what the right-hand is doing.

    Furthermore, it is now more important than ever that companies ensure the accuracy of information about customers, their accounts, and their relationship with the company, as well as ensuring the consistency of that information between systems. The severity of impact of the risk is relatively high (reputational loss, cost of investigations, cost of refunds) and the likelihood of occurrence is also higher in today’s economic climate.

    Given the importance of information in modern businesses, and the likelihood of increased litigation during a recession, it is inevitable: poor quality information will get you sued.

  • Finding Red Herrings or Missing a Trick?

    This post is written by Colin Boylan, an independent market research professional based in Wicklow, Ireland with extensive experience in Market Research in pharma and other industries in the UK and Ireland. In this post, Colin explains how the quality of the population sample used in a market research study can have significant effects on the quality of the findings. His post was inspired by recent posts here and here about “Golden Databases“. I’m glad to give Colin a chance to try his blogging chops out and I hope visitors here enjoy reading his insights in to information quality and market research.

    Finding Red Herrings or Missing a Trick?

    For most businesses there are major advantages to investing money in doing direct research with your customer base   In theory it’s a ready built list of people who are familiar with your business – so they can speak with authority on their experience as your customer.
    The value of customer research to business should be by now fairly obvious, but there’s an old saying in research (and elsewhere) – “garbage in, garbage out”. The insights built off the data
    generated from your customer list is only as relevant as the list of people you ask to participate in the research.
    However if, for example, they are lapsed customers then researching them is going to give you a picture of what your past customers wanted from you (unless these people are the focus of your research of course).   Is this the same as what your present customers want?  And if you are looking for why past customers stopped dealing with you and use a list full of current  customers you end up with either few people able to answer the questions you set or …worse….data from people who shouldn’t have answered the question – which leads to another scenario.
    Picture an important piece of research done with a list of past and present customers mixed in together with no way to tell who is who.  Do current and ex-customers differ in their wants and needs from your business?     I don’t know – and neither do you.   So how useful are any insights generated from this research?  Not being able to separate these two groups gives rise to two potential scenarios.  Either the excess numbers in there are throwing up ‘clear’ results that are not applicable to your current customers or the combination of both bodies is adding noise which stops you uncovering real insights about the customers you’re interested in – you’re either finding red herrings or you’re missing a trick!
    I’ve used just one scenario here to make a point that can be applied to lots of customer data stored by companies – be it incorrect regional information, incorrect gender, you can add whatever block of data is relevant to your own company here and the story is the same.   If the data is not accurate then any use it is put to suffers.

    For most businesses there are major advantages to investing money in doing direct research with your customer base In theory it’s a ready built list of people who are familiar with your business – so they can speak with authority on their experience as your customer.

    The value of customer research to business should be by now fairly obvious, but there’s an old saying in research (and elsewhere) – “garbage in, garbage out”. The insights built off the data generated from your customer list is only as relevant as the list of people you ask to participate in the research.

    However if, for example, they are lapsed customers then researching them is going to give you a picture of what your past customers wanted from you (unless these people are the focus of your research of course). Is this the same as what your present customers want? And if you are looking for why past customers stopped dealing with you and use a list full of current customers you end up with either few people able to answer the questions you set or …worse….data from people who shouldn’t have answered the question – which leads to another scenario.

    Picture an important piece of research done with a list of past and present customers mixed in together with no way to tell who is who. Do current and ex-customers differ in their wants and needs from your business? I don’t know – and neither do you. So how useful are any insights generated from this research? Not being able to separate these two groups gives rise to two potential scenarios. Either the excess numbers in there are throwing up ‘clear’ results that are not applicable to your current customers or the combination of both bodies is adding noise which stops you uncovering real insights about the customers you’re interested in – you’re either finding red herrings or you’re missing a trick!

    I’ve used just one scenario here to make a point that can be applied to lots of customer data stored by companies – be it incorrect regional information, incorrect gender, you can add whatever block of data is relevant to your own company here and the story is the same. If the data is not accurate then any use it is put to suffers.

  • The Risk of Poor Quality Information (2) #nama

    84% fail. Do you remember that statistic from my previous post?

    In my earlier post on this topic I wrote about  how issues of identity (name and address) can cause problems when attempting to consolidate data from multiple systems into one Single View of Master Data. I also ran through the frightening statistics relating to the failure rate of these types of projects, ranging up to 84%.

    Finally, I plugged two IAIDQ conferences on this topic. One is in Dublin on the 28th of September. The other is in Cardiff (currently being rescheduled for the end of next month).

    A key root cause of these failure rates has been identified. At the heart of many of these failures is a failure to understand and profile data  to better understand the risks and issues in the data that is being consolidated.

    So, if we assume that the risk that is NAMA is a risk that the Government will take, surely then it behoves the Government and NAMA to ensure that they take necessary steps to mitigate the risks posed to their plan by poor quality information and reduce the probability of failure because of data issues from around 8 in 10 to something more palatable (Zero Defects anyone?)

    Examples of problems that might occur (Part 2)

    Last time we talked about name and address issues. This time out we talk a little about more technical things  like metadata and business rules. (You, down the back… stay awake please).

    Divided by a Common Language (or the muddle of Metadata)

    Oscar Wilde is credited with describing America and Britain as two countries divided only by a common language.

    When bringing data together from different systems, there is often an assumption that if the fields are called the same thing or a similar thing, then they are likely to hold the same data. This assumption in particular is the mother of all cock-ups.

    I worked on a project once where there were two systems being merged for reporting purposes. System A had a field called Customer ID. System B had a field called Customer Number. The data was combined and the resulting report was hailed as something likely to promote growth, but only in roses. In short, it was a pile of manure.

    The root cause was that System A’s field was a field that uniquely identified customer records with an auto-incrementing numeric value (it started at 1, and added 1 until it was done). The Customer Number field in System B, well it contained letters and numbers and, most importantly, it didn’t actually identify the customer.

    ‘Metadata’ (and I sense an attack by Metadata puritans any moment now) is basically defined as “data about data” which helps you understand the values in the field and also helps you make correct assumptions about whether Tab A can actually be connected to Slot B in a way that will actually make sense. It ranges from the technical (this field has only numbers in it for all the data) to the conceptual (e.g. “A customer is…”).

    And here is the rub. Within individual organisations, there is often (indeed I would say inevitably) differences of opinion (to put it politically) about the meaning of the meta data within that organisation. Different business units may have different understandings of what a customer is. Software systems that have sprung up in silo responses to immediate tactical (or even strategic need) often have field names that are different for the same thing (synonyms) or are the same for different things (homonyms). Either can cause serious problems in the quality of consolidated data.

    Now NAMA, it would seem, will be consolidating data from multiple areas from within multiple banks. This is a metadata problem squared, which increases the level of risk still further.

    Three Loan Monty (or “One rule to ring them all”)

    One of the things we learned from Michael Lynn (apart from how lawyers and country music should never mix) was that property developers and speculators occasionally pull a fast one and give the same piece of property as security to multiple banks for multiple loans. The assumption that they seemed to have made in the good times was:

    1. No one would notice (sure, who’d be pulling all the details of loans and assets from most Irish Banks into one big database)
    2. They’d be able to pay off the loans before anyone would notice

    Well, number 1 is about to happen and number 2 has stopped happening in many cases.

    To think about this in the context of a data model or a set of business rules for a moment:

    • A person (or persons) can be the borrowers on more than one Loan
    • One loan can be secured against zero (unsecured), one (secured), or more than one (secured) assets.

    What we saw in the Lynn case broke these simple rules.

    An advantage of NAMA is that it gives an opportunity to actually get some metrics on how frequently this was allowed to happen. Once there is a Single View of Borrower it would be straightforward to profile the data and test for the simple business rules outlined above.

    The problem arises if incidents like this are discovered where there are three or four loans secured  against the same asset and one of them has a fixed charge or a crystallised charge over the asset and the others have some sort of impairment on their security (such as paperwork not being filed correctly and the charge not actually existing).

    If the loan with the charge is the smallest of the four, this means that NAMA winds up with three expensive unsecured loans as the principle in Commercial Law is that first in time prevails- in other words the first registered charge is the one that secures the  asset.

    It may very well be that the analysis of the banks loan books has already gone into the detail here and there is a cunning plan to address this type of problem as it arises. I’d be interested to see how such a plan would work.

    Unfortunately, I would fear that the issues uncovered in the Michael Lynn debacle haven’t gone away and remain lurking under the surface.

    Conclusion (for now)

    Information is ephemeral and intangible. Banks (and all businesses) use abstract facts stored in files to describe real-world things.

    Often the labels and attributes associated with those facts are not aligned or are created and defined in “silos” which create barriers to effective communication within an organisation. Such problems are multipled manifold when you begin to bring data from multiple independent entities together into one place and try to make it into a holistic information asset.

    Often things get lost or muddled in translation.

    Furthermore, where business rules that should govern a process have been broken or not enforced historically there are inevitably going to be ‘gaps of fact‘ (or ‘chasms of unknowingness’ if there is a lot of broken rules). Those ‘gaps of fact’ can undermine critical assumptions in processes or data migrations.

    When we talk of assumptions I am reminded of the old joke about how the economist who got stranded on a desert island was eventually rescued. On the sand he wrote “First, assume a rescue party will find me”.

    Where there is a ‘gap of fact’, it is unfortunately the case that there is a very high probability (around 84%) that the economist would be found, but found dead.

    Effective management of the risk of poor quality information requires people to set aside assumptions and act on fact and mind the gap.

  • The Risk of Poor Quality Information #nama

    The title of this post is, co-incidentally, the title of a conference I’m organising in Dblin next week..

    It is a timely topic given the contribution that poor quality information played in the sub-prime mortgage collapse in the US. While a degree of magical thinking is also to blame (“what, I can just say I’m a CEO with €1million and you’ll take my word for it?”), ultimately the risks that poor quality information posed to down stream processes and decisions  were not effectively managed even if they were actually recognised.

    Listening to the NAMA (twitter hash-tag #nama) debate on-line yesterday (and following it on the excellent liveblog.ie I couldn’t help but think about the “Happy path” thinking that seems to be prevailing and how similar it is to the Happy Path thinking that pervaded the CRM goldrush of the late 1990s and early 2000’s, and the ERP and MDM bandwagons that have trundled through a little place I call “ProjectsVille” in the intervening years.

    (note to people checking Wikipedia links above… Wikipedia, in its wisdom, seems to class CRM, ERP and MDM as “IT” issues. That’s bullshit frankly and doesn’t reflect the key lessons learned from painful failures over the years in many companies around the world. While there is an IT component to implementing solutions and excuting  projects, these are all fundamentally part of core business strategy and are a business challenge. )

    But I digress….

    Basically, at the heart of every CRM project, ERP project or MDM project is the need to create a “Single View of Something”, be it this bizarre creature called a “Customer” (they are like Yeti.. we all believe they exist but no-one can precisely describe or define them), or “Widget” or other things that the Business needs to know about to, well… run the business and survive.

    This involves taking data from multiple sources and combining them together in a single repository of facts. So if you have  999 seperate Access databases and 45000 spreadsheets with customer  data on them and data about what products your customers have bought, ideally you want to be boiling them down to one database of customers and one database of products with links between them that tell you that Customer 456  has bought 45000 of Widget X in the last 6 months and likes to be phoned after 4:30pm on Thursdays and prefers to be called ‘Dave’ instead of “Mr Rodgers”, oh… and theyhan’t got around to paying you for 40,000 of those widgets yet.

    (This is the kind of thing that Damien Mulley referred to recently as a “Golden Database”.)

    NAMA proposes to basically take the facts that are known about a load of loans from multiple lenders, put them all together in a “Single View of Abyss” (they’d probably call it something else) and from that easily and accurately identify underperforming and non-performing loans and put the State in the position where it can ultimately take the assets on which loans were secured or for which loans were acquired if the loans aren’t being repaid.

    Ignoring the economists’ arguments about the approach, this sounds very much like a classic CRM/MDM problem where you have lots of source data sets and want to boil them down to three basic sets of facts:

    • Property or other assets affected by loans (either used as security or purchased using loans)
    • People or companies who borrowed those monies
    • Information about the performance of those loans.

    Ideally then you should be able to ask the magic computermebob to tell you exactly what loans Developer X has, and what assets are those loans secured on.

    This is Happy Path.

    Some statistics now to give you an insight into just how crappy the crappy path can be.

    • An Accenture study a few years ago found that over 70% of CRM implementations had failed to deliver on the promised “Single View of Customer”
    • Bloor Research in 2007 found that 84% of all ERP data migrations fail (either run over time, over budget or fail to integrate all the data) because of problems with the quality of the data
    • As recently as last month, Gartner Group reported that 75% of CFOs surveyed felt that poor quality information was a direct impediment to achieving business goals.

    Examples of problems that might occur

    Address Data (also known as “Postcode postcode wherefore art thou postcode?”)

    Ireland is one of the few countries that lacks a postcode system. This means that postal addresses in Ireland are, for want of a better expression, fuzzy.

    Take for example one townland in Wexford called Murrintown. only it’s not. It has been for centuries as far as the locals are concerned but according to the Ordnance Survey and the Place Names commission, the locals don’t know how to spell. All the road signs have “Murntown”.

    Yes,  An Post has the *koff* lovely */koff* Geodirectory system which is the nearest thing to an address standard database we have in Ireland. Of course, it is designed and populated to supprt the delivery of letter post. As a result, many towns and villages have been transposed around the country as their “Town” from a postal perspective is actually their nearest main sorting office.

    Ballyhaunis in County  Mayo is famously logged in Geodirectory as being in Co. Roscommon. This results in property being occasionally misfiled.

    There are also occasionally typographical errors and transcription errors in data in data. For example, some genius put an accented character into the name of the development I live in in Wexford which means that Google Maps, Satnavs and other cleverness can’t find my address unless I actually screw it up on purpose.