Tag: Information Quality

  • Bruce Schneier on Privacy

    Via the Twitters I came across this absolutely brilliant video of Bruce Schneier talking about data privacy (that’s the American for Data Protection). Bruce makes some great points.

    One of the key points that overlaps between Data Protection and Information Quality is where he tells us that

    Data is the pollution problem of the Information Age.  It stays around, it has to dealt with and its secondary uses are what concerns us. Just as… … we look back at the the beginning of the previous century and sort of marvel at how the titans of industry in the rush to build the industrial age would ignore pollution, I think… … we will be judged by our grandchildren and great-grandchildren by how well we dealt with data, with individuals and their relationships to their data, in the information society.

    This echoes the Peter Drucker comment that I reference constantly in talks and with clients of my company where Drucker said that

    So far, for 50 years, the information revolution has centered on data—their collection, storage, transmission, analysis, and presentation. It has centered on the “T” in IT.  The next information revolution asks, what is the MEANING of information, and what is its PURPOSE?

    Bruce raises a number of other great points, such as how as a species we haven’t adapted to what is technically possible and the complexity of control is the challenge for the individual, with younger people having to make increasingly complex and informed decisions about their privacy and what data they put where and why (back to meaning and purpose).

    I really like his points on the legal economics of Information and Data. In college I really enjoyed my “Economics of Law” courses and I tend to look at legalistic problems through an economic prism (after all, the law is just another balancing mechanism for human conduct). I like them so much I’m going to park my thoughts on them for another post.

    But, to return to Bruce’s point that Data is the pollution problem of the Information age, I believe that that statement is horribly true whether we consider data privacy/protection or Information Quality. How much of the crud data that clutters up organisations and sucks resources away from the bottom line is essentially the toxic slag of inefficient and “environmentally unfriendly” processes and business models? How much of that toxic waste is being buried and ignored rather than cleaned up or disposed of with care?

    Is Information Quality Management a “Green” industry flying under a different flag?

  • Who then is my customer?

    Two weeks ago I had the privilege of taking part in the IAIDQ’s Ask the Expert Webinar for World Quality Day (or as it will now be know, World Information Quality Day).

    The general format of the event was that a few of the IAIDQ Directors shared stories from their personal experiences or professional insights and extrapolated out what the landscape might be like in 2014 (the 10th anniversary of the IAIDQ).

    A key factor in all of the stories that were shared was the need to focus on the needs of your information customer, and the fact that the information customer may not be the person who you think they are. More often than not, failing to consider the needs of your information customers can result in outcomes that are significantly below expectations.

    One of my favourite legal maxims is Lord Atkin’s definition of who your ‘neighbour’ is who you owe legal duties of care to. He describes your ‘neighbour’ as being anyone who you should reasonably have in your mind when undertaking any action, or deciding not to take any action. While this defines a ‘neighbour’ from the point of view of litigation, I think it is also a very good definition of your “customer” in any process.

    Recently I had the misfortune to witness first hand what happens when one part of an organisation institutes a change in a process without ensuring that the people who they should have reasonably had in their mind when instituting the change were aware that the change was coming.

    My wife had a surgical procedure and a drain was inserted for a few days. After about 2 days, the drain was full and needed to be changed. The nurses on the ward couldn’t figure out how to change my wife’s drain because the drain that had been inserted was a new type which the surgical teams had elected to go with but which the ward nurses had never seen before.

    For a further full day my wife suffered the indignity of various medical staff attempting to figure out how to change the drain.

    1. There was no replacement drain of that type available on the ward. The connections were incompatible with the standard drain that was readily available to staff on the ward and which they were familiar with.
    2. When a replacement drain was sourced and fitted, no-one could figure out how to actually activate the magic vacuum function of it that made it work. The instructions on the device itself were incomplete.

    When the mystery of the drain fitting was eventually solved, the puzzle of how to actually read the amount of fluid being drained presented itself, which was only of importance as the surgeon had left instructions that the drain was to be removed once the output had dropped below a certain amount. The device itself presented misleading information, appearing to be filled to one level but when emptied out in fact containing a lesser amount (an information presentation quality problem one might say).

    The impacts of all this were:

    • A distressed and disturbed patient increasingly worried about the quality of care she was receiving.
    • Wasted time and resources pulling medical staff from other duties to try and solve the mystery of the drain
    • A very peeved and increasingly irate quality management blogger growing more annoyed at the whole situation.
    • Medical staff feeling and looking incompetent in front of a patient (and the patient’s family)

    Eventually the issues were sorted out and the drain was removed, but the outcome was a decidedly sub-optimal one for all involved. And it could have been easily avoided had there been proper communication about the change to the ward nurses and the doctors in the department from the surgical teams when they changed their standard. Had the surgical teams asked the question of who should they have in their minds to communicate with when taking an action, surely the post-op nurses should have featured in there somewhere?

    I would be tempted to say “silly Health Service” if I hadn’t seen exactly this type of scenario play out in day to day operations and flagship IT projects during the course of my career. Whether it is changing the format of a spreadsheet report so it can’t be loaded into a database or filtered, changing a reporting standard, changing meta-data or reference data, or changing process steps, each of these can result in poor quality information outcomes and irate information customers.

    So, while information quality is defined from the perspective of your information customers, you should take the time to step back and ask yourself who those information customers actually are before making changes that impact on the downstream ability of those customers to meet the needs of their customers.

  • A game changer – Ferguson v British Gas

    Back in April I wrote an article for the IAIDQ’s Quarterly Member Newsletter picking up on my niche theme, Common Law liability for poor quality information – in other words, the likelihood that poor quality information and poor quality information management practices will result in your organisation (or you personally) being sued.

    I’ve written and presented on this theme many times over the past few years and it always struck me how people started off being in the “that’s too theoretical” camp but by the time I (and occasionally my speaking/writing partner on this stuff, Mr Fergal Crehan) had finished people were all but phoning their company lawyers to have a chat.

    To an extent, I have to admit that in the early days much of this was theoretical, taking precedents from other areas of law and trying to figure out how they fit together in an Information Quality context. However, in January 2009 a case was heard in the Court of Appeal in England and Wales which has significant implications for the Information Quality profession and which has had almost no coverage (other than coverage via the IAIDQ and myself). My legal colleagues describe it as “ground breaking” for the profession because of the simple legal principle it creates regarding complex and silo’d computing environments and the impact of disparate and plain crummy data. I see it as a clear rallying cry that makes it crystal clear that poor information quality will get you sued.

    Recent reports (here and here) and anecdotal evidence suggest that in the current economic climate, the risk to companies of litigation is increasing. Simply put, the issues that might have been brushed aside or resolved amicably in the past are now life and death issues, at least in the commercial sense. As a result there is now a trend to “lawyer up” at the first sign of trouble. This trend is likely to accelerate in the context of issues involving information, and I suspect, particularly in financial services.

    A recent article in the Commercial Litigation Journal (Frisby & Morrison, 2008) supports this supposition. In that article, the authors conclude:

    “History has shown that during previous downturns in market conditions, litigation has been a source of increased activity in law firms as businesses fight to hold onto what they have or utilise it as a cashflow tool to avoid paying money out.”

    The Case that (should have) shook the Information Quality world

    The case of Ferguson v British Gas was started by Ms. Ferguson, a former customer of British Gas who had transferred to a new supplier but to whom British Gas continued to send invoices and letters with threats to cut off her supply, start legal proceedings, and report her to credit rating agencies.

    Ms Ferguson complained and received assurances that this would stop but the correspondence continued. Ms Ferguson then sued British Gas for harassment.

    Among the defences put forward by British Gas were the arguments that:

    (a) correspondence generated by automated systems did not amount to harassment, and (b) for the conduct to amount to harassment, Ms Ferguson would have to show that the company had “actual knowledge” that its behaviour was harassment.

    The Court of Appeal dismissed both these arguments. Lord Justice Breen, one of the judges on the panel for this appeal, ruled that:

    “It is clear from this case that a corporation, large or small, can be responsible for harassment and can’t rely on the argument that there is no ‘controlling mind’ in the company and that the left hand didn’t know what the right hand was doing,” he said.

    Lord Justice Jacob, in delivering the ruling of the Court, dismissed the automated systems argument by saying:

    “[British Gas] also made the point that the correspondence was computer generated and so, for some reason which I do not really follow, Ms. Ferguson should not have taken it as seriously as if it had come from an individual. But real people are responsible for programming and entering material into the computer. It is British Gas’s system which, at the very least, allowed the impugned conduct to happen.”

    So what does this mean?

    In this ruling, the Court of Appeal for England and Wales has effectively indicated a judicial dismissal of a ‘silo’ view of the organization when a company is being sued. The courts attribute to the company the full knowledge it ought to have had if the left hand knew what the right hand was doing. Any future defence argument grounded on the silo nature of organizations will likely fail. If the company will not break down barriers to ensure that its conduct meets the reasonable expectations of its customers, the courts will do it for them.

    Secondly, the Court clearly had little time or patience for the argument that correspondence generated by a computer was any less weighty or worrisome than a letter written by a human being. Lord Justice Jacob’s statement places the emphasis on the people who program the computer and the people who enter the information. The faulty ‘system’ he refers to includes more than just the computer system; arguably, it also encompasses the human factors in the systemic management of the core processes of British Gas.

    Thirdly, the Court noted that perfectly good and inexpensive avenues to remedy in this type of case exist through the UK’s Trading Standards regulations. Thus from a risk management perspective, the probability of a company being prosecuted for this type of error will increase.

    British Gas settled with Ms Ferguson for an undisclosed amount and was ordered to pay her costs.

    What does it mean from an Information Quality perspective?

    From an Information Quality perspective, this case clearly shows the legal risks that arise from (a) disconnected and siloed systems, and (b) inconsistencies between the facts about real world entities that are contained in these systems.

    It would appear that the debt recovery systems in British Gas were not updated with correct customer account balances (amongst other potential issues).

    Ms. Ferguson was told repeatedly by one part of British Gas that the situation was resolved, while another part of British Gas rolled forward with threats of litigation. The root cause here would appear to be an incomplete or inaccurate record or a failure of British Gas’ systems. The Court’s judgment implies that that poor quality data isn’t a defence against litigation.

    The ruling’s emphasis on the importance of people in the management of information, in terms of programming computers (which can be interpreted to include the IT tasks involved in designing and developing systems) and inputting data (which can be interpreted as defining the data that the business uses, and managing the processes that create, maintain, and apply that data) is likewise significant.

    Clearly, an effective information quality strategy and culture, implemented through people and systems, could have avoided the customer service disaster and litigation that this case represents.  The court held the company accountable for not breaking down barriers between departments and systems so that the left-hand of the organization knows what the right-hand is doing.

    Furthermore, it is now more important than ever that companies ensure the accuracy of information about customers, their accounts, and their relationship with the company, as well as ensuring the consistency of that information between systems. The severity of impact of the risk is relatively high (reputational loss, cost of investigations, cost of refunds) and the likelihood of occurrence is also higher in today’s economic climate.

    Given the importance of information in modern businesses, and the likelihood of increased litigation during a recession, it is inevitable: poor quality information will get you sued.

  • Golden Databases – a slight return

    Last week I shared a cautionary note about companies relying on their under-touched and under-loved Customer databases to help drive their business as we hit the bottom of the recessionary curve. The elevator pitch synopsis… Caveat emptor – the data may not be what you think it is and you risk irritating your customers if they find errors about them in your data.

    Which brings me to Vodafone Ireland and the data they hold about me. I initially thought that the poor quality information they have about me existed only in the database being used to drive their “Mission Red” campaign. For those of you who aren’t aware, “Mission Red” is Vodafone Ireland’s high profile customer intimacy drive wher they are asking customers to vote for their preference of add-on packages. Unfortunately, what I want isn’t listed under their options.

    What I want is for Vodafone Ireland to undo the unrequested gender reassignment they’ve subjected me to. (more…)

  • Software Quality, Information Quality, and Customer Service

    Cripes. It’s been a month since I last posted here. Time flies when you are helping your boss figure out how to divide your work up before you leave the company in 3 weeks. I’ve also been very busy with my work in the International Association for Information and Data Quality – lots of interesting things happening there, including the Blog Carnival for Data Quality which I’ll be hosting come Monday!

    One of the things I do in the IAIDQ is moderate and manage the IQTrainwrecks.com website. It is a resource site for people which captures real world stories of how poor quality information impacts on people, companies, and even economies.

    Earlier this week I posted a case that was flagged to me by the nice people over at Tuppenceworth.ie concerning double-charging on customer accounts arising from a software bug. Details of that story can be found on IQTrainwrecks and on Tuppenceworth. I’d advise you to read either of those posts as they provide the context necessary for what follows here. (more…)

  • Certified Information Quality Professional

    Recent shenanigans around the world have highlighted the importance of good quality information. Over at idqcert.iaidq.org I’ve written a mid-sized post explaining why I am a passionate supporter of the IAIDQ’s Certified Information Quality Practitioner certification.

    Basically, there is a need for people who are managing information quality challenges to have a clear benchmark that sets them and their profession apart from the ‘IT’ misnomer. A clear code of ethics for the profession (a part of the certification as I understand iit) is also important. My reading of the situation, particularly in at least one Irish financial institution, is that people were more concerned with presenting the answer that was wanted to a question rather than the answer that was needed and there appears to have been some ‘massaging’ of figures to present a less than accurate view of things – resulting in investors making decisions based on incomplete or inaccurate information.

    Hopefully the CIQP certification will help raise standards and the awareness of the standards that should be required for people working with information in the information age.

  • Cripes, the blog has been name-checked by my publisher…

    TwentyMajor isn’t the only blogger in the pay of a publisher (I’m conveniently ignoring Grandad and the others as Irish bloggers are too darned fond of publishing these days. If you want to know who all the Irish bloggers with publishers are then Damien Mulley probably has a list)!

    I recently wrote an industry report for a UK publisher on Information Quality strategy. The publisher then swapped all my references to Information Quality to references to Data Quality as that was their ‘brand’ on the publication. I prefer the term Information Quality for a variety of reasons.

    As this runs to over 100 pages of A4 it has a lot of words in it. My fingers were tired after typing it. Unlike Twenty’s book, I’ve got pictures in mine (not those kind of pictures, unfortunately, but nice diagrams of concepts related to strategy and Information Quality. If you want the other kind of pictures, you’ll need to go here.)

    In the marketing blurb and bumph that I put together for the publisher I mentioned this blog and the IQTrainwrecks.com blog. Imagine my surprise when I opened a sales email from the publisher today (yes, they included me on the sales mailing list… the irony is not lost on me… information quality, author, not likely to buy my own report when I’ve got the four drafts of it on the lappytop here).

    So, for the next few weeks I’ll have to look all serious and proper in a ‘knowing what I’m talking about’ kind of way to encourage people to by my report. (I had toyed with some variation on booky-wook but it just doesn’t work – reporty-wort… no thanks, I don’t want warts).

    So things I’ll have to refrain from doing include:

    1. Engaging in pointless satirical attacks on the government or businesses just for a laugh, unless I can find an Information Quality angle
    2. Talking too loudly about politics
    3. Giving out about rural/urban digital divides in Ireland
    4. Parsing and reformatting the arguments of leading Irish opinion writers to expose the absence of logic or argument therein.
    5. Engaging in socio-economic analysis of the fate of highstreet purveyors of dirty water parading as coffee.
    6. Swearing

    That last one is a f***ing pain in the a**.

    If any of you are interested in buying my ‘umble little report, it is available for sale from Ark Group via this link.. . This link will make them think you got the email they sent to me, and you can get a discount, getting the yoke for £202.50 including postage and packing (normally £345+£7.50p&p. (Or click here to avoid the email campaign software…)

    And if any of you would like to see the content that I’d have preferred the link in the sales person’s to send you to (coz it highlights the need for good quality management of your information quality) then just click away here to go to IQTrainwrecks.com

    Thanks to Larry, Tom, Danette, the wifey for their support while I was writing the report and Stephanie and Vanessa at Ark Group for their encouragement to get it finished by the deadline.

  • The Electoral Register (Here we go again)

    The Irish Times today carries a story on page five which details a number of proposed changes to the management of the Electoral Register arising from the kerfuffle of the past two years about how totally buggered it is. For those of you who don’t know, I’ve written a little bit about this in the past (earning an Obsessive Blogger badge in the process donchaknow). It was just under two years ago that I opened this blog with a post on this very topic…

    A number of points raised in the article interest me, if for no other reason than they sound very familiar – more on that anon. Other interest me because they still run somewhat counter to the approach that is needed to finally resolve the issue.

    I’ll start with the bits that run counter to the approach required. The Oireachtas Committee has been pretty much consistent in its application of the boot to Local Authorities as regards the priority they give to the management of the Electoral Register. According to the Irish Times article, the TDs and Senators found that:

    “Running elections is not a core function of local authorities. Indeed, it is not a function that appears to demand attention every year. It can, therefore, be questioned if it gets the priority it warrants under the array of authorities”

    I must humbly agree and disagree with this statement. By appearing to blame Local Authorities for the problem and for failing to prioritise the management of the Electoral Register, the Committee effectively absolves successive Ministers for the Environment and other elected officials from failing to ensure that this ‘information asset’ was properly maintained. Ultimately, all Local Authorities fall under the remit of the Minister for Environment, Heritage and Local Government. As the ‘supreme being’ in that particular food chain, the Minister (and their department) is in a position to set policy, establish priorities and mandate adequate resourcing of any Local Authority function, from Water Services to Electoral Franchise.

    The key issue is that Franchise section was not seen as important by anyone. A key information asset was not managed, no continual plans were put in place for the acquisition of information or the maintenance of information. Only when there were problems applying the information did anyone give a darn. This, unfortunately, is a problem that is not confined to Local Government and Electoral data however – a large number of companies world wide have felt the pain of failing to manage the quality of their information assets in recent times.

    Failing to acknowledge that the lack of management priority was systemic and endemic within the entire hierarchy of Central and Local Government means that a group of people who probably tried to do their best with the resources assigned to them are probably going to feel very aggrieved. “The Register is buggered. It’s your fault. We’re taking it away from you” is the current message. Rather it should be “The system we were operating is broken. Collectively there was a failure to prioritise the management of this resource. The people tried to make it work, but best efforts were never enough. It needs to be replaced.”

    W. Edward’s Deming advised people seeking to improve quality to ‘drive out fear’. A corollary of that is that one should not engage in blame when a system is broken unless you are willing to blame all actors in the system equally.

    However, I’m equally guilty as I raised this issue (albeit not in as ‘blaming’ a tone) back in… oh 2006.:

    Does the current structure of Local Authorities managing Electoral Register data without a clear central authority with control/co-ordination functions (such as to build the national ‘master’ file) have any contribution to the overstatement of the Register?

    Moving on to other points that sound very familiar…

    1. Errors are due to a “wide variety of practices” within Local Authorities. Yup, I recall writing about that as a possible root cause back in 2006. Here and here and here and here and here in fact.
    2. The use of other data sources to supplement the information available to maintain the Register is one suggestion. Hmmm… does this sound like it covers the issue?
    3. Could the Electoral Register process make use of a data source of people who are moving house (such as An Posts’s mail redirection service or newaddress.ie)? How can that be utilised in an enhanced process to manage & maintain the electoral register? These are technically surrogate sources of reality rather than being ‘reality’ itself, but they might be useful.

      That’s from a post I wrote here on the 24th April 2006.

      And then there’s this report, which was sent to Eamon Gilmore on my behalf and which ultimately found its way to Dick Roche’s desk while he was still the Minister in the DOELG. Pages 3 to 5 make interesting reading in light of the current proposals. Please note the negatives that I identified with the use of data from 3rd party organisations that would need to be overcome for the solution to be entirely practicable. These can be worked around with sound governance and planning, but bumbling into a solution without understanding the potential problems that would need to be addressed will lead to a less than successful implementation.

    4. The big proposal is the creation of a ‘central authority’ to manage the Electoral Register. This is not new. It is simply a variation on a theme put forward by Eamon Gilmore in a Private Member’s Bill which was debated back in 2006 and defeated at the Second Stage(The Electoral Registration Commissioner Bill, 2005). This is a proposal that I also critiqued in the report that wound its way to Dick Roche… see pages 3 to 5 again. I also raise issues of management and management culture at page 11.
    5. The use of PPS numbers is being considered but there are implications around Data Protection . Hmm… let’s see… I mentioned those issues in this post and in this post.
    6. And it further assumes that the PPS Identity is always accurate (it may not be, particularly if someone is moving house or has moved house. I know of one case where someone was receiving their Tax Certs at the address they lived in in Dublin but when they went to claim something, all the paperwork was sent to their family’s home address down the country where they hadn’t lived for nearly 15 years.)

      In my report in 2006 (and on this blog) I also discussed the PPS Number and the potential for fraud if not linked to some form of photographic ID given the nature of documents that a PPS number can be printed on in the report linked to above. This exact point was referenced by Senator Camillus Glynn at a meeting of the Committee last week

      “I would not have a difficulty with using the PPS card. It is logical, makes sense and is consistent with what obtains in the North. The PPS card should also include photographic evidence. I could get hold of Deputy Scanlon’s card. Who is to say that I am not the Deputy if his photograph is not on the card? Whatever we do must be as foolproof as possible.”

      This comment was supported by a number of other committee members.

    So, where does that leave us? Just under two years since I started obsessively blogging about this issue, we’ve moved not much further than when I started. There is a lot of familiarity about the sound-bites coming out at present – to put it another way, there is little on the table at the moment (it seems) that was not contained in the report I prepared or on this blog back in 2006.

    What is new? Well, for a start they aren’t going to make Voter Registration compulsory. Back in 2006 I debated this briefly with Damien Blake… as I recall Damien had proposed automatic registration based on PPS number and date of birth. I questioned whether that would be possible without legislative changes or if it was even desirable. However, the clarification that mandatory registration is now off the table is new.

    The proposal for a centralised governance agency and the removal of responsibility for Franchise /Electoral Register information from the Local Authorities sounds new. But it’s not. It’s a variation on a theme that simply addresses the criticism I had of the original Labour Party proposal. By creating a single agency the issues of Accountability/Responsibility and Governance are greatly simplified, as are issues of standardisation of forms and processes and information systems.

    One new thing is the notion that people should be able to update their details year round, not just in a narrow window in November. This is a small but significant change in process and protocol that addresses a likely root cause.

    What is also new – to an extent – is the clear proposal that this National Electoral Office should be managed by a single head (one leader), answerable to the Dail and outside the normal Civil Service structures (enabling them to hire their own staff to meet their needs). This is important as it sets out a clear governance and accountability structure (which I’d emphasised was needed – Labour’s initial proposal was for a Quango to work in tandem with Local Authorities… a recipe for ‘too many cooks’ if ever I’d heard one). That this head should have the same tenure as a judge to “promote independence from government” is also important, not just because of the independence and allegiance issues it gets around, but also because it sends a very clear message.

    The Electoral Register is an important Information Asset and needs to be managed as such. It is not a ‘clerical’ function that can be left to the side when other tasks need to be performed. It is serious work for serious people with serious consequences when it goes wrong.

    Putting its management on a totally independent footing with clear accountability to the Oireachtas and the Electorate rather than in an under-resourced and undervalued section within one of 34 Local Authorities assures an adequate consistency of Governance and a Constancy of Purpose. The risk is that unless this agency is properly funded and resourced it will become a ‘quality department’ function that is all talk and no trousers and will fail to achieve its objectives.

    As much of the proposals seem to be based on (or eerily parallel) analysis and recommendations I was formulating back in 2006, I humbly put myself forward for the position of Head of the National Elections Office 😉

  • Final post and update on IBTS issues

    OK. This is (hopefully) my final post on the IBTS issues. I may post their response to my queries about why I received a letter and why my data was in New York. I may not. So here we go..

    First off, courtesy of a source who enquired about the investigation, the Data Protection Commissioner has finished their investigation and the IBTS seems to have done everything as correct as they could, in the eyes of the DPC with regard to managing risk and tending to the security of the data. The issue of why the data was not anonymised seems to be dealt with on the grounds that the fields with personal data could not be isolated in the log files. The DPC finding was that the data provided was not excessive in the circumstances.

    [Update: Here’s a link to the Data Protection Commissioner’s report. ]

    This suggests to me that the log files effectively amounted to long strings of text which would have needed to be parsed to extract given name/family name/telephone number/address details, or else the fields in the log tables are named strangely and unintuitively (not as uncommon as you might think) and the IBTS does not have a mapping of the fields to the data that they contain.

    In either case, parsing software is not that expensive (in the grand scheme of things) and a wide array of data quality tools provide very powerful parsing capabilities at moderate costs. I think of Informatica’s Data Quality Workbench (a product originally developed in Ireland), Trillium Software’s offerings or the nice tools from Datanomic.

    Many of these tools (or others from similar vendors) can also help identify the type of data in fields so that organisations can identify what information they have where in their systems. “Ah, field x_system_operator_label actually has names in it!… now what?”.

    If the log files effectively contained totally unintelligible data, one would need to ask what the value of it for testing would be, unless the project involved the parsing of this data in some way to make it ‘useable’? As such, one must assume that there was some inherent structure/pattern to the data that information quality tools would be able to interpret.

    Given that according to the DPC the NYBC were selected after a public tender process to provide a data extraction tool this would suggest that there was some structure to the data that could be interpreted. It also (for me) raises the question as to whether any data had been extracted in a structured format from the log files?

    Also the “the data is secure because we couldn’t figure out where it was in the file so no-one else will” defence is not the strongest plank to stand on. Using any of the tools described above (or similar ones that exist in the open source space, or can be assembled from tools such as Python or TCL/TK or put together in JAVA) it would be possible to parse out key data from a string of text without a lot of ‘technical’ expertise (Ok, if you are ‘home rolling’ a solution using TCL or Python you’d need to be up to speed on techie things, but not that much). Some context data might be needed (such as a list of possible firstnames and a list of lastnames, but that type of data is relatively easy to put together. Of course, it would need to be considered worth the effort and the laptop itself was probably worth more than irish data would be to a NYC criminal.

    The response from the DPC that I’ve seen doesn’t address the question of whether NYBC failed to act in a manner consistent with their duty of care by letting the data out of a controlled environment (it looks like there was a near blind reliance on the security of the encryption). However, that is more a fault of the NYBC than the IBTS… I suspect more attention will be paid to physical control of data issues in future. While the EU model contract arrangements regarding encryption are all well and good, sometimes it serves to exceed the minimum standards set.

    The other part of this post relates to the letter template that Fitz kindly offered to put together for visitors here. Fitz lives over at http://tugofwar.spaces.live.com if anyone is interested. I’ve gussied up the text he posted elsewhere on this site into a word doc for download ==> Template Letter.

    Fitz invites people to take this letter as a starting point and edit it as they see fit. My suggestion is to edit it to reflect an accurate statement of your situation. For example… if you haven’t received a letter from the IBTS then just jump to the end and request a copy of your personal data from the IBTS (it will cost you a few quid to get it), if you haven’t phoned their help-line don’t mention it in the letter etc…. keep it real to you rather than looking like a totally formulaic letter.

    On a lighter note, a friend of mine has received multiple letters from the Road Safety Authority telling him he’s missed his driving test and will now forfeit his fee. Thing is, he passed his test three years ago. Which begs the question (apart from the question of why they are sending him letters now)… why the RSA still has his application details given that data should only be retained for as long as it is required for the stated purpose for which it was collected? And why have the RSA failed to maintain the information accurately (it is wrong in at least one significant way).

  • IBTS… returning to the scene of the crime

    Some days I wake up feeling like Lt. Columbo. I bound out of bed assured in myself that, throughout the day I’ll be niggled by, or rather niggle others with, ‘just one more question’.

    Today was not one of those days. But you’d be surprised what can happen while going about the morning ablutions. “Over 171000 (174618 in total) records sent to New York. Sheesh. That’s a lot. Particularly for a sub-set of the database reflecting records that were updated between 2nd July 2007 and 11th October 2007. That’s a lot of people giving blood or having blood tests, particularly during a short period. The statistics for blood donation in Ireland must be phenomenal. I’m surprised we can drag our anaemic carcasses from the leaba and do anything; thank god for steak sandwiches, breakfast rolls and pints of Guinness!”, I hummed to myself as I scrubbed the dentation and hacked the night’s stubble off the otherwise babysoft and unblemished chin (apologies – read Twenty Major’s book from cover to cover yesterday and the rich prose rubbed off on me).

    “I wonder where I’d get some stats for blood donation in Ireland. If only there was some form of Service or agency that managed these things. Oh.. hang on…, what’s that Internet? Silly me.”

    So I took a look at the IBTS annual report for 2006 to see if there was any evidence of back slapping and awards for our doubtlessly Olympian donation efforts.

    According to the the IBTS, “Only 4% of our population are regular donors” (source: Chairperson’s statement on page 3 of the report). Assuming the population in 2006 (pre census data publication) was around 4.5 million (including children), this would suggest a maximum regular donor pool of 180,000. If we take the CSO data breaking out population by age, and make a crude guess on the % of 15-24 year olds that are over 18 (we’ll assume 60%) then the pool shrinks further… to around 3.1 million, giving a regular donor pool of 124000 approx.

    Hmm… that’s less than the number of records sent as test data to New York based on a sub-set of the database. But my estimations could be wrong.

    The IBTS Annual Report for 2006 tells us (on page 13) that

    The average age of the donors who gave blood
    in 2006 was 38 years and 43,678 or 46% of our
    donors were between the ages of 18 and 35
    years.

    OK. So let’s stop piddling around with assumptions based on the 4% of population hypothesis. Here’s a simpler sum to work out… If X = 46% of Y, calculate Y.

    (43678/46)X100 = 94952 people giving blood in total in 2006. Oh. That’s even less than the other number. And that’s for a full year. Not a sample date range. That is <56% of the figure quoted by the IBTS. Of course, this may be the number of unique people donating rather than a count of individual instances of donation… if people donated more than once the figure could be higher.

    The explanation may also lie with the fact that transaction data was included in the extract given to the NYBC (and record of a donation could be a transaction). As a result there may be more than one row of data for each person who had their data sent to New York (unless in 2007 there was a magical doubling of the numbers of people giving blood).

    According to the IBTS press release:

    The transaction files are generated when any modification is made to any record in Progesa and the relevant period was 2nd July 2007 to 11th October 2007 when 171,324 donor records and 3,294 patient blood group records were updated.

    (the emphasis is mine).

    The key element of that sentence is “any modification is made to any record”. Any change. At all. So, the question I would pose now is what modifications are made to records in Progresa? Are, for example, records of SMS messages sent to the donor pool kept associated with donor records? Are, for example, records of mailings sent to donors kept associated? Is an audit trail of changes to personal data kept? If so, why and for how long? (Data can only be kept for as long as it is needed). Who has access rights to modify records in the Progresa system? Does any access of personal data create a log record? I know that the act of donating blood is not the primary trigger here… apart from anything else, the numbers just don’t add up.

    It would also suggest that the data was sent in a ‘flat file’ structure with personal data repeated in the file for each row of transaction data.

    How many distinct person records were sent to NYBC in New York? Was it

    • A defined subset of the donors on the Progresa system who have been ‘double counted in the headlines due to transaction records being included in the file? ….or
    • All donors?
    • Something in between?

    If the IBTS can’t answer that, perhaps they might be able to provide information on the average number of transactions logged per unique identified person in their database during the period July to October 2007?

    Of course, this brings the question arc back to the simplest question of all… while production transaction records might have been required, why were ‘live’ personal details required for this software development project and why was anonymised or ‘defused’ personal data not used?

    To conclude…
    Poor quality information may have leaked out of the IBTS as regards the total numbers of people affected by this data breach. The volume of records they claim to have sent cannot (at least by me) be reconciled with the statistics for blood donations. They are not even close.

    The happy path news here is that the total number of people could be a lot less. If we assume ‘double dipping’ as a result of more than one modification of a donor record, then the worst case scenario is that almost their entire ‘active’ donor list has been lost. The best case scenario is that a subset of that list has gone walkies. It really does boil down to how many rows of transaction information were included alongside each personal record.

    However, it is clear that, despite how it may have been spun in the media, the persons affected by this are NOT necessarily confined to the pool of people who may have donated blood or had blood tests peformed between July 2007 and October 2007. Any modification to data about you in the Progresa System would have created a transaction record. We have no information on what these modifications might entail or how many modifications might have occured, on average, per person during that period.

    In that context the maximum pool of people potentially affected becomes anyone who has given blood or had blood tests and might have a record on the Progressa system.

    That is the crappy path scenario.

    Reality is probably somewhere in between.

    But, in the final analysis, it should be clear that real personal data should never have been used and providing such data to NYBC was most likely in breach of the IBTS’s own data protection policy.