Tag: digital_data

  • Final post and update on IBTS issues

    OK. This is (hopefully) my final post on the IBTS issues. I may post their response to my queries about why I received a letter and why my data was in New York. I may not. So here we go..

    First off, courtesy of a source who enquired about the investigation, the Data Protection Commissioner has finished their investigation and the IBTS seems to have done everything as correct as they could, in the eyes of the DPC with regard to managing risk and tending to the security of the data. The issue of why the data was not anonymised seems to be dealt with on the grounds that the fields with personal data could not be isolated in the log files. The DPC finding was that the data provided was not excessive in the circumstances.

    [Update: Here’s a link to the Data Protection Commissioner’s report. ]

    This suggests to me that the log files effectively amounted to long strings of text which would have needed to be parsed to extract given name/family name/telephone number/address details, or else the fields in the log tables are named strangely and unintuitively (not as uncommon as you might think) and the IBTS does not have a mapping of the fields to the data that they contain.

    In either case, parsing software is not that expensive (in the grand scheme of things) and a wide array of data quality tools provide very powerful parsing capabilities at moderate costs. I think of Informatica’s Data Quality Workbench (a product originally developed in Ireland), Trillium Software’s offerings or the nice tools from Datanomic.

    Many of these tools (or others from similar vendors) can also help identify the type of data in fields so that organisations can identify what information they have where in their systems. “Ah, field x_system_operator_label actually has names in it!… now what?”.

    If the log files effectively contained totally unintelligible data, one would need to ask what the value of it for testing would be, unless the project involved the parsing of this data in some way to make it ‘useable’? As such, one must assume that there was some inherent structure/pattern to the data that information quality tools would be able to interpret.

    Given that according to the DPC the NYBC were selected after a public tender process to provide a data extraction tool this would suggest that there was some structure to the data that could be interpreted. It also (for me) raises the question as to whether any data had been extracted in a structured format from the log files?

    Also the “the data is secure because we couldn’t figure out where it was in the file so no-one else will” defence is not the strongest plank to stand on. Using any of the tools described above (or similar ones that exist in the open source space, or can be assembled from tools such as Python or TCL/TK or put together in JAVA) it would be possible to parse out key data from a string of text without a lot of ‘technical’ expertise (Ok, if you are ‘home rolling’ a solution using TCL or Python you’d need to be up to speed on techie things, but not that much). Some context data might be needed (such as a list of possible firstnames and a list of lastnames, but that type of data is relatively easy to put together. Of course, it would need to be considered worth the effort and the laptop itself was probably worth more than irish data would be to a NYC criminal.

    The response from the DPC that I’ve seen doesn’t address the question of whether NYBC failed to act in a manner consistent with their duty of care by letting the data out of a controlled environment (it looks like there was a near blind reliance on the security of the encryption). However, that is more a fault of the NYBC than the IBTS… I suspect more attention will be paid to physical control of data issues in future. While the EU model contract arrangements regarding encryption are all well and good, sometimes it serves to exceed the minimum standards set.

    The other part of this post relates to the letter template that Fitz kindly offered to put together for visitors here. Fitz lives over at http://tugofwar.spaces.live.com if anyone is interested. I’ve gussied up the text he posted elsewhere on this site into a word doc for download ==> Template Letter.

    Fitz invites people to take this letter as a starting point and edit it as they see fit. My suggestion is to edit it to reflect an accurate statement of your situation. For example… if you haven’t received a letter from the IBTS then just jump to the end and request a copy of your personal data from the IBTS (it will cost you a few quid to get it), if you haven’t phoned their help-line don’t mention it in the letter etc…. keep it real to you rather than looking like a totally formulaic letter.

    On a lighter note, a friend of mine has received multiple letters from the Road Safety Authority telling him he’s missed his driving test and will now forfeit his fee. Thing is, he passed his test three years ago. Which begs the question (apart from the question of why they are sending him letters now)… why the RSA still has his application details given that data should only be retained for as long as it is required for the stated purpose for which it was collected? And why have the RSA failed to maintain the information accurately (it is wrong in at least one significant way).

  • DoBlog is 1 year old this month

    Last night I was doing some housekeeping on the site and I noticed that the year-ometer on the DoBlog is about to turn over. The first post on www.obriend.com (as it was then) was on 18th April 2006. It was about the Electoral Register.

    With an Election looming in the next few weeks it is worth revisiting where we are on that particular issue.

    • The Electoral Register issues are still not resolved
    • There does not appear to have been any substantive analysis of the actual root causes (apart from some work I did off my own bat as a concerned citizen)
    • The work that was done to ‘correct’ the Register was managed inconsistently between Local Authority areas, which means that we may not have improve the accuracy all that much.
    • That work was completed a while ago… the Register will have degraded in quality again since
    • It seems that entire housing estates (even in the Minister’s own constituency) may have been dropped off the Register

    Over on McGarr Solicitors site I paraphrased Paul Simon to describe the state of the Register -“Still broken after all these years”. It is. The scrap and rework was botched and it wasn’t even the right thing to do.

    With most elections or polls in Ireland now being decided by the narrowest of margins it is more important than ever that everyone of us who can vote in the forthcoming General Election does vote. It was once said that in a democracy you don’t always get the government you want, you get the government you deserve.

    So vote. Vote diligently.

  • ka-BOOM – the Information Age Explosion is upon us.

    CNN.com has an interesting report on a study that was conducted by IDC, an industry think tank and research company, into the volume of information that we are creating and storing – and more importantly who is creating that information.

    IDC estimates that the world had 185 exabytes of storage available last year and will have 601 exabytes in 2010. But the amount of stuff generated is expected to jump from 161 exabytes last year to 988 exabytes (closing in on 1 zettabyte) in 2010.

    IDC estimates that by 2010, about 70 percent of the world’s digital data will be created by individuals. For corporations, information is inflating from such disparate causes as surveillance cameras and data-retention regulations.

    The growth in ‘long tail’ activities like blogging and YouTube are contributors to this. Explosions can wind up as one of two things – an impressively awe inspiring fireworks display of elegance and beauty… or a shock and awe filled detonation. The fact that this explosion of information is being driven by individuals raises a significant risk that as the quantity increases the quality decreases. We are seeing elements of this risk in the recent story about the Wikipedia expert who was making it up as he went along and had lied about his credentials.

    However, this issue of alleged experts with either non-existent qualifications or qualifications which may not be what they appear to be is not restricted to just the Internet – it is an off-line issue too. We really can’t ignore the Diploma mills churning out PhDs who might not have the level of skills one might expect from the title.

    What can Information Quality professionals and Bloggers do to help maintain quality levels and keep collateral damage from this explosion to a minimum?

    • When blogging, first do no harm. Make sure you verify sources for your stories as much as you can and respond to any comments that report errors or innacurracies in your posts – in short act with the sort of standards we would expect from journalists (although which we might not always get)
    • Try and develop an understanding of good practices in terms of structuring your content (categories in WordPress are metadata for example)
    • As Microsoft said in a recent advertising campaign here in Ireland – “Information that cant be found is information that can’t be used”. Quality of information includes the quality of how that information is presented – designing your sites so they are accessible to people with visual problems is a good practice. Likewise having a logical structure on your site and your content is also important.

    Reading the figures that IDC have produced (which incidentally used some proprietary internal research so might not be capable of being replicated in an independent study) makes me think of the advice that Peter Parker’s Uncle Ben gives him in the first Spiderman Movie… “With great power comes great responsibility”.

    WordPress, Youtube, Wikipedia and similar tools have placed a great power in the hands of the wired individual. However just because it is on the web (no poor pun intended) it does not mean that the rules of the real world have switched off. Under the Common Law Tort of Negligent Misstatement there is a duty of care on all people who are providing information to ensure that that information is correct. Admittedly to succeed in suing someone for negligent misstatement you need to show that your reliance on the information and any loss you incur were reasonably foreseeable and that the person publishing the information owed a duty of care to you specifically (are you their ‘neighbour’ in a legal sense? Are you a class of person that the publisher of the information should have considered might be a consumer of their information?).

    With great power comes great responsibilty. Dr Ben Goldacre, a columnist with the Guardian Newspaper in the UK, who’s article about questionable qualifications I’ve linked to above, ends that particular article in a very eloquent way that sums up why we need to ensure that we maintain quality standards as the volume of information available grows. I unashamedly pinch it because it is very good – I’ve just added some emphasis (please read the full article to put this in its original context)…

    “I am writing this article, sneakily, late, at the back of the room, in the Royal College of Physicians, at a conference discussing how to free up access to medical academic knowledge for the public. At the front, as I type, Sir Muir Gray, director of the NHS National Electronic Library For Health, is speaking: “Ignorance is like cholera,” he says. “It cannot be controlled by the individual alone: it requires the organised efforts of society.” He’s right: in the 19th and 20th centuries, we made huge advances through the provision of clean, clear water; and in the 21st century, clean, clear information will produce those same advances.

    Blog wisely. Blog well.