Tag: Ethics-&-Law-of-Information

  • Final post and update on IBTS issues

    OK. This is (hopefully) my final post on the IBTS issues. I may post their response to my queries about why I received a letter and why my data was in New York. I may not. So here we go..

    First off, courtesy of a source who enquired about the investigation, the Data Protection Commissioner has finished their investigation and the IBTS seems to have done everything as correct as they could, in the eyes of the DPC with regard to managing risk and tending to the security of the data. The issue of why the data was not anonymised seems to be dealt with on the grounds that the fields with personal data could not be isolated in the log files. The DPC finding was that the data provided was not excessive in the circumstances.

    [Update: Here’s a link to the Data Protection Commissioner’s report. ]

    This suggests to me that the log files effectively amounted to long strings of text which would have needed to be parsed to extract given name/family name/telephone number/address details, or else the fields in the log tables are named strangely and unintuitively (not as uncommon as you might think) and the IBTS does not have a mapping of the fields to the data that they contain.

    In either case, parsing software is not that expensive (in the grand scheme of things) and a wide array of data quality tools provide very powerful parsing capabilities at moderate costs. I think of Informatica’s Data Quality Workbench (a product originally developed in Ireland), Trillium Software’s offerings or the nice tools from Datanomic.

    Many of these tools (or others from similar vendors) can also help identify the type of data in fields so that organisations can identify what information they have where in their systems. “Ah, field x_system_operator_label actually has names in it!… now what?”.

    If the log files effectively contained totally unintelligible data, one would need to ask what the value of it for testing would be, unless the project involved the parsing of this data in some way to make it ‘useable’? As such, one must assume that there was some inherent structure/pattern to the data that information quality tools would be able to interpret.

    Given that according to the DPC the NYBC were selected after a public tender process to provide a data extraction tool this would suggest that there was some structure to the data that could be interpreted. It also (for me) raises the question as to whether any data had been extracted in a structured format from the log files?

    Also the “the data is secure because we couldn’t figure out where it was in the file so no-one else will” defence is not the strongest plank to stand on. Using any of the tools described above (or similar ones that exist in the open source space, or can be assembled from tools such as Python or TCL/TK or put together in JAVA) it would be possible to parse out key data from a string of text without a lot of ‘technical’ expertise (Ok, if you are ‘home rolling’ a solution using TCL or Python you’d need to be up to speed on techie things, but not that much). Some context data might be needed (such as a list of possible firstnames and a list of lastnames, but that type of data is relatively easy to put together. Of course, it would need to be considered worth the effort and the laptop itself was probably worth more than irish data would be to a NYC criminal.

    The response from the DPC that I’ve seen doesn’t address the question of whether NYBC failed to act in a manner consistent with their duty of care by letting the data out of a controlled environment (it looks like there was a near blind reliance on the security of the encryption). However, that is more a fault of the NYBC than the IBTS… I suspect more attention will be paid to physical control of data issues in future. While the EU model contract arrangements regarding encryption are all well and good, sometimes it serves to exceed the minimum standards set.

    The other part of this post relates to the letter template that Fitz kindly offered to put together for visitors here. Fitz lives over at http://tugofwar.spaces.live.com if anyone is interested. I’ve gussied up the text he posted elsewhere on this site into a word doc for download ==> Template Letter.

    Fitz invites people to take this letter as a starting point and edit it as they see fit. My suggestion is to edit it to reflect an accurate statement of your situation. For example… if you haven’t received a letter from the IBTS then just jump to the end and request a copy of your personal data from the IBTS (it will cost you a few quid to get it), if you haven’t phoned their help-line don’t mention it in the letter etc…. keep it real to you rather than looking like a totally formulaic letter.

    On a lighter note, a friend of mine has received multiple letters from the Road Safety Authority telling him he’s missed his driving test and will now forfeit his fee. Thing is, he passed his test three years ago. Which begs the question (apart from the question of why they are sending him letters now)… why the RSA still has his application details given that data should only be retained for as long as it is required for the stated purpose for which it was collected? And why have the RSA failed to maintain the information accurately (it is wrong in at least one significant way).

  • IBTS… returning to the scene of the crime

    Some days I wake up feeling like Lt. Columbo. I bound out of bed assured in myself that, throughout the day I’ll be niggled by, or rather niggle others with, ‘just one more question’.

    Today was not one of those days. But you’d be surprised what can happen while going about the morning ablutions. “Over 171000 (174618 in total) records sent to New York. Sheesh. That’s a lot. Particularly for a sub-set of the database reflecting records that were updated between 2nd July 2007 and 11th October 2007. That’s a lot of people giving blood or having blood tests, particularly during a short period. The statistics for blood donation in Ireland must be phenomenal. I’m surprised we can drag our anaemic carcasses from the leaba and do anything; thank god for steak sandwiches, breakfast rolls and pints of Guinness!”, I hummed to myself as I scrubbed the dentation and hacked the night’s stubble off the otherwise babysoft and unblemished chin (apologies – read Twenty Major’s book from cover to cover yesterday and the rich prose rubbed off on me).

    “I wonder where I’d get some stats for blood donation in Ireland. If only there was some form of Service or agency that managed these things. Oh.. hang on…, what’s that Internet? Silly me.”

    So I took a look at the IBTS annual report for 2006 to see if there was any evidence of back slapping and awards for our doubtlessly Olympian donation efforts.

    According to the the IBTS, “Only 4% of our population are regular donors” (source: Chairperson’s statement on page 3 of the report). Assuming the population in 2006 (pre census data publication) was around 4.5 million (including children), this would suggest a maximum regular donor pool of 180,000. If we take the CSO data breaking out population by age, and make a crude guess on the % of 15-24 year olds that are over 18 (we’ll assume 60%) then the pool shrinks further… to around 3.1 million, giving a regular donor pool of 124000 approx.

    Hmm… that’s less than the number of records sent as test data to New York based on a sub-set of the database. But my estimations could be wrong.

    The IBTS Annual Report for 2006 tells us (on page 13) that

    The average age of the donors who gave blood
    in 2006 was 38 years and 43,678 or 46% of our
    donors were between the ages of 18 and 35
    years.

    OK. So let’s stop piddling around with assumptions based on the 4% of population hypothesis. Here’s a simpler sum to work out… If X = 46% of Y, calculate Y.

    (43678/46)X100 = 94952 people giving blood in total in 2006. Oh. That’s even less than the other number. And that’s for a full year. Not a sample date range. That is <56% of the figure quoted by the IBTS. Of course, this may be the number of unique people donating rather than a count of individual instances of donation… if people donated more than once the figure could be higher.

    The explanation may also lie with the fact that transaction data was included in the extract given to the NYBC (and record of a donation could be a transaction). As a result there may be more than one row of data for each person who had their data sent to New York (unless in 2007 there was a magical doubling of the numbers of people giving blood).

    According to the IBTS press release:

    The transaction files are generated when any modification is made to any record in Progesa and the relevant period was 2nd July 2007 to 11th October 2007 when 171,324 donor records and 3,294 patient blood group records were updated.

    (the emphasis is mine).

    The key element of that sentence is “any modification is made to any record”. Any change. At all. So, the question I would pose now is what modifications are made to records in Progresa? Are, for example, records of SMS messages sent to the donor pool kept associated with donor records? Are, for example, records of mailings sent to donors kept associated? Is an audit trail of changes to personal data kept? If so, why and for how long? (Data can only be kept for as long as it is needed). Who has access rights to modify records in the Progresa system? Does any access of personal data create a log record? I know that the act of donating blood is not the primary trigger here… apart from anything else, the numbers just don’t add up.

    It would also suggest that the data was sent in a ‘flat file’ structure with personal data repeated in the file for each row of transaction data.

    How many distinct person records were sent to NYBC in New York? Was it

    • A defined subset of the donors on the Progresa system who have been ‘double counted in the headlines due to transaction records being included in the file? ….or
    • All donors?
    • Something in between?

    If the IBTS can’t answer that, perhaps they might be able to provide information on the average number of transactions logged per unique identified person in their database during the period July to October 2007?

    Of course, this brings the question arc back to the simplest question of all… while production transaction records might have been required, why were ‘live’ personal details required for this software development project and why was anonymised or ‘defused’ personal data not used?

    To conclude…
    Poor quality information may have leaked out of the IBTS as regards the total numbers of people affected by this data breach. The volume of records they claim to have sent cannot (at least by me) be reconciled with the statistics for blood donations. They are not even close.

    The happy path news here is that the total number of people could be a lot less. If we assume ‘double dipping’ as a result of more than one modification of a donor record, then the worst case scenario is that almost their entire ‘active’ donor list has been lost. The best case scenario is that a subset of that list has gone walkies. It really does boil down to how many rows of transaction information were included alongside each personal record.

    However, it is clear that, despite how it may have been spun in the media, the persons affected by this are NOT necessarily confined to the pool of people who may have donated blood or had blood tests peformed between July 2007 and October 2007. Any modification to data about you in the Progresa System would have created a transaction record. We have no information on what these modifications might entail or how many modifications might have occured, on average, per person during that period.

    In that context the maximum pool of people potentially affected becomes anyone who has given blood or had blood tests and might have a record on the Progressa system.

    That is the crappy path scenario.

    Reality is probably somewhere in between.

    But, in the final analysis, it should be clear that real personal data should never have been used and providing such data to NYBC was most likely in breach of the IBTS’s own data protection policy.

  • So what did the IBTSB do right?

    In the interests of a bit of balance, and prompted by some considered comment by Owen O’Connor on Simon’s post over on Tuppenceworth, I thought it might be worth focussing for a moment on what the IBTSB did right.

    1. They had a plan that recognised data security as a key concern.
    2. They specified contract terms to deal with how the data was to be handled. (these terms may have been breached by the data going on an unexpected tour of New York)
    3. They made use of encryption to protect the data in transit (there is no guarantee however that the data was in an encrypted state at all times)
    4. They notified promptly and put their hands up rather than ignoring the problem and hoping it would go away. That alone is to be commended.

    So they planned relatively well and responded quickly when shit hit the fan. The big unknown in all of this is whether the data has been compromised. If we assume happy path, then the individual in an organisation which had a contractual obligation to protect the security of the data but took it home anyway kept the data encrypted on the laptop. This may indeed be the case. I

    t could also be the case that this person didn’t appreciate the obligations owed and precautions required and, apart from removing the data from a controlled and securable environment, had decrypted the data to have a poke around at it. That is the crappy path.

    Ultimately it is a roll of the dice as to which you put your trust in.

    In previous posts I have asked why production data was being used for a test event and why it had not been anonymised or tweaked to reduce its ability to identify real individuals. In his comment over on Tuppenceworth, Owen O’Connor contends that

    the data being examined was to do with the actual usage and operation of the IBTS system

    If the data that was being examined was log files for database transactions then one might query (no pun intended) why personal identifying data was included. If it was unavoidable but to send sample records (perhaps for replication of transaction events?) then this might actually be in accordance with the IBTSB’s data protection policy. But if the specifics of names etc. were not required for the testing (ie if it was purely transactional processing that was being assesed and not, for example, the operation of parsing or matching algorithms) then they should have and could have been mangled to make them anonymous with out affecting the validity of any testing that was being done.

    If a sound reason for using real data exists that is plausible and warranted the level of risk involved then (having conducted similar testing activities myself during my career) I’d be happy that the IBTSB had done pretty much everything they could reasonably have been asked to do to ensure security of the data. The only other option I would possibly have suggested would be remote access to data held on a server in Ireland which would have certainly meant that no data would have been on a laptop in New York (but latency on broadband connections etc. might have mitigated against accurate test results perhaps).

    In the Dail, the IBTSB has come in for some stick for their sloppy handling. Owen O’Connor is correct however – the handling of the spin has been quite good and most of the risk planning was what would be expected. If anyone is guilty of sloppy handling it is the NYBC who acted in breach of their agreement (most likely) by letting the data out of the controlled environment of their offices.

    So, to be clear, I feel for the project manager and team in the IBTSB who are in the middle of what is doubtless a difficult situation. But for the grace of god (and a sense of extreme paranoia in the planning stages of developer test events) go I. The response was correct. Get it out in the open and bring in the Data Protection commissioner as soon as possible. The planning was at least risk-aware. They learned from Nixon (it’s the cover up that gets you)

    However, if there was not a compelling reason for real data about real people being used in the testing that could not have been addressed with either more time or mor money then I would still contend that the use of the production data was ill-advised and in breach of the IBTSB’s own policies.

  • More thoughts on the IBTS data breach

    One of the joys of having occasional bouts of insomnia is that you can spend hours in the dead of night pondering what might have happened in a particular scenario based on your experience and the experience of others.

    For example, the IBTS has rushed to assure us that the data that was sent to New York was encrypted to 256bit-AES standard. To a non-technical person that sounds impressive. To a technical person, that sounds slightly impressive.

    However, a file containing 171000+ records could be somewhat large, depending on how many fields of data it contained and whether that data contained long ‘free text’ fields etc. When data is extracted from database it is usually dumped to a text file format which has delimiters to identify the fields such as commas or tab characters or defined field widths etc.

    When a file is particularly large, it is often compressed before being put on a disc for transfer – a bit like how we all try to compress our clothes in our suitcase when trying to get just one bag on Aer Lingus or Ryanair flights. One of the most common software tools used (in the microsoft windows environment) is called WinZip. It compresses files but can also encrypt the archive file so that a password is required to open it. When the file needs to be used, it can be extracted from the archive, so long as you have the password for the compressed file. winzip encryption screenshot.
    So, it would not be entirely untrue for the IBTS to say that they had encrypted the data before sending it and it was in an encrypted state on the laptop if all they had done was compressed the file using Winzip and ticked the boxes to apply encryption. And as long as the password wasn’t something obvious or easily guessed (like “secret” or “passw0rd” or “bloodbank”) the data in the compressed file would be relatively secure behind the encryption.

    However, for the data to be used for anything it would need to be uncompressed and would sit, naked and unsecure, on the laptop to be prodded and poked by the application developers as they went about their business. Where this to be the case then, much like the fabled emperor, the IBTS’s story has no clothes. Unencrypted data would have been on the laptop when it was stolen. Your unencrypted, non-anonymised data could have been on the laptop when it was stolen.

    The other scenario is that the actual file itself was encrypted using appropriate software. There are many tools in the market to do this, some free, some not so free. In this scenario, the actual file is encrypted and is not necessarily compressed. To access the file one would need the appropriate ‘key’, either a password or a keycode saved to a memory stick or similar that would let the encryption software know you were the right person to open the file.

    However, once you have the key you can unencrypt the file and save an unencrypted copy. If the file was being worked on for development purposes it is possible that an unencrypted copy might have been made. This may have happened contrary to policies and agreements because, sometimes, people try to take shortcuts to get to a goal and do silly things. In that scenario, personal data relating to Irish Blood donors could have wound up in an unencrypted state on a laptop that was stolen in New York.

    [Update**] Having discussed this over the course of the morning with a knowledgable academic who used to run his own software development company, it seems pretty much inevitable that the data was actually in an unencrypted state on the laptop, unless there was an unusual level of diligence on the part of the New York Blood Clinic regarding the handling of data by developers when not in the office.

    The programmer takes data home of an evening/weekend to work on some code without distractions or to beat a deadline. To use the file he/she would need to have unencrypted it (unless the software they were testing could access encrypted files… in which case does the development version have ‘hardened’ security itself?). If the file was unencrypted to be worked on at home, it is not beyond possiblity that the file was left unencrypted on the laptop at the time it was stolen.

    All of which brings me back to a point I made yesterday….

    Why was un-anonymised production data being used for a development/testing activity in contravention to the IBTS’s stated Data Protection policy, Privacy statement and Donor Charter and in breach of section 2 of the Data Protection Act?

    If the data had been fake, the issue of encryption or non-encryption would not be an issue. Fake is fake, and while the theft would be embarrassing it would not have constituted a breach of the Data Protection Act. I notice from Tuppenceworth.ie that the IBTSB were not quick to respond to Simon’s innocent enquiry about why dummy data wasn’t used.

  • Fair use/Specified purpose and the IBTS

    I am a blood donor. I am proud of it. I have provided quite a lot of sensitive personal data to the IBTS over the years that I’ve been donating.

    The specific purposes for which I believed I was providing the information was to allow the IBTS to administer communications with me as a donor (so I know when clinics are on so I can donate), to allow the IBTS to identify me and track my donation patterns, and to alert IBTS staff to any reasons why I cannot donate on a given occasion (donated too recently in the past, I’ve had an illness etc.). I accepted as implied purposes the use of my information for internal reporting and statistical purposes.

    I did not provide the information for the purposes of testing software developed by a 3rd party, particularly when that party is in a foreign country.

    The IBTS’s website (www.ibts.ie) has a privacy policy which relates to data captured through their website. It tells me that

    The IBTS does not collect any personal data about you on this website apart from information which you volunteer (for example by emailing us or by using our on line contact forms). Any information which you provide in this way is not made available to any third parties, and is used by the IBTS only for the purpose for which you provided it.

    So, if any information relating to my donor record was captured via the website, the IBTS is in breach of their own privacy policy. So if you register to be a donor… using this link… http://www.ibts.ie/register.cfm?mID=2&sID=77 then that information is covered by their Privacy policy and you would not be unreasonable in assuming that your data wouldn’t wind up on a laptop in a crackhouse in New York.

    In the IBTS’s Donor Charter, they assure potential Donors that:

    The IBTS guarantees that all personal information about donors is kept in the strictest confidence

    Hmm… so no provision here for production data to be used in testing. Quite the contrary.

    However, it gets even better… in the Donor Information Leaflet on the IBTS’s website, in the Data Protection section (scroll down… it’s right at the bottom), current and potential donors the IBTS tells us that (emphasis is mine throughout):

    The IBTS holds donor details, donation details and test results on a secure computerised database. This database is used by the IBTS to communicate with donors and to record their donation details, including all blood sample test results. It is also used for the proper and necessary administration of the IBTS. All the information held is treated with the strictest confidence.

    This information may also be used for research in order to improve our knowledge about the blood donor population, and for clinical audit, to assess and improve the quality of our service. Wherever possible, all such information will be anonymised.

    Right.. so from their policy and their statement of fair use and specified purposes we learn that:

    1. They can use it for communication with donors and for tracking donation details and results of tests (as expected)
    2. They can use it for necessary administration. Which covers internal reporting but, I would argue, not giving it to other organisations to lose on their behalf.
    3. They can use it for research about the blood donor population, auditing clinical practices. This is OK… and expected.
    4. They are also permitted to use the data to “improve the quality of [their] service”. That might cover the use of the data for testing…

    Until you read that last bit… the data would be anonymised whenever possible. That basically means the creation of dummy data as described towards the end of my last post on this topic.

    So, the IBTS did not specify at any time that they would use the information I had provided to them for the purposes of software development by 3rd parties. It did specify a purpose for using the information for the improvement of service quality. But only if it was anonymised.

    Section 2 of the Data Protection Act says that data can only be used by a Data Controller for the specific purposes for which it has been gathered. As the use of un-anonymised personal data for the purposes of software development by agencies based outside of the EU (or in the EU for that matter) was not a specified use, the IBTS is, at this point, in breach of the Data Protection Act. If the data had been anonymised (ie if ‘fictional’ test data had been used or if the identifying elements of the personal data had been muddled up before being transferred) there would likely be no issue.

    • Firstly, the data would have been provided in a manner consistent with the specified use of the data
    • Secondly, there would have been no risk to personal data security as the data on the stolen laptop would not have related to an identifiable person in the real world.

    Of course, that would have cost a few euros to do so it was probable de-scoped from the project.

    If I get a letter and my data was not anonymised I’ll be raising a specific complaint under Section 2 of the Data Protection Act. If the data was not anonymised (regardless of the security precautions applied) then the IBTS is in breach of their specified purposes for the collection of the data and are in breach of the Data Protection Act.

    Billy Hawkes, if you are reading this I’ve just saved your team 3 weeks work.

  • Irish Blood Transfusion Service loses data..

    Why is it that people never learn? Only months after the debacle of HMRC sending millions of records of live confidential data whizzing around in the post on 2 CDs (or DVDs), the Irish Blood Transfusion Service (IBTS) has had 171,000 records of blood tests and blood donors stolen.

    The data was on a laptop (bad enough from a security point of view). The data was (apparently) secured with 256bit AES encryption (happy days if true). The laptop was taken in a mugging (unfortunate). The mugging took place in New York (WTF!?!?)

    Why was the data in New York?
    It would seem that the IBTS had contracted with the New York Blood Centre (NYBC) for the customisation of some software that the NYBC had developed to better manage information on donors and blood test results. To that end the IBTS gave a copy of ‘live’ (or what we call in the trade ‘production’) data to the NYBC for them to use in developing the customisations.

    So, personal data, which may contain ‘sensitive’ data relating to sexual activity, sexual behaviour, medicial conditions etc. was sent to the US. But it was encrypted, we are assured.

    A quick look at the Safe Harbor list of the US Dept of Commerce reveals that the NYBC is not registered as being a ‘Safe Harbor’ for personal data from within the EU. Facebook is however (and we all know how compliant Facebook is with basic rules of data protection).

    Apparently the IBTS relied on provisions of their contract with the NYBC to ensure and assure the security of the data relating to REAL people. As yet no information has come to light regarding whether any audits or checks were performed to ensure that those contractual terms were being complied with or were capable of being complied with.

    How did the data get to New York?
    From the IBTS press release it is clear that the data got to New York in a controlled manner.
    An employee of NYBC took the disc back from Ireland and placed it in secure storage.

    Which is a lot better than sticking two CDs in the post, like the UK Revenue services did not so long ago.

    What about sending the data by email? Hmmm… nope, not secure enough and the file sizes might be to big. A direct point to point FTP between two servers? that would work as well, assuming that the FTP facilities were appropriately secured by firewalls and a healthy sense of paranoia.

    Why was the data needed in New York?
    According to the Irish Times

    The records were in New York, the blood service said, “because we are upgrading the software that we use to analyse our data to provide a better service to donors, patients and the public service”.

    Cool. So the data was needed in New York to let the developers make the necessary modifications to code.

    Nice sound bite. Hangs together well. Sounds reasonable.

    Unfortunately it is total nonsense.

    For the developers to make modifications to an existing application, what was required in New York was

    • A detailed specification of what the modifications needed to be to enable the software to function for Irish datasets and meet Irish requirements. Eg. if the name/address data capture screens needed to change they should have been specified in a document. If validation routines for zip cods/postcodes needed to be turned off, that should have been specified. If base data/reference data needed to be change – specify it in a document. Are we seeing a trend here?
    • Definition of the data formats used in Ireland. by this I mean the definition of the formats of data such as “social security number”. We call it a PPSN and it has a format nnnnnnnA as opposed to the US format which has dashes in the middle. A definition of the data formats that would be used in Ireland and a mapping to/from the US formats would possibly be required… this is (wait for it) another document. NOT THE DATA ITSELF
    • Some data for testing. Ok, so this is why all 171000+ records were on a laptop in New York. ehh… NO. What was required was a sample data set that replicates the formats and patterns of data found in the IBTS production data. This does not mean a cut of production data. What this means is that the IBTS should have created dummy data that was a replica of production data (warts and all – so if there are 10% of their records that have text values in fields where numbers would be expected, then 10% of the test data should reflect this). The test data should also be tied to specific test cases (experiments to prove or disprove functionality in the software).

    At no time was production data needed for development or developer testing activities in New York. Clear project specification and requirements documentation, documents about data formatting and ‘meta-data’ (data about data), Use Cases (walk throughs of how the software would be used in a given process – like a movie script) and either a set of dummy sample data that looks and smells like you production data or a ‘recipe’ for how the developer can create that data.

    But the production data would be needed for Acceptance testing by IBTS?
    eh… nope. And even if it was it would not need to be sent to New York for the testing.

    User Acceptance testing is a stage of testing in software development AFTER the developer swears blind that the software works as it should and BEFORE the knowledge workers in your organisation bitch loudly that the software is buggered up beyond all recognition.

    As with all testing it does not require a the use of production data is not required, and indeed is often a VERY BAD IDEA (except in certain extreme circumstances such as the need for volume stress testing or testing of very complex software solutions that need data that is exactly like production to be tested effectively… eg. a complex parsing/matching/loading process on a multi-million record database – and even at that, key data not relevant to the specific process being tested ought to be ‘obscured’ to ensure data protection compliance ).

    What is required is that your test environment is as close a copy to the reality you are testing for as possible. So, from a test data point of view, creating test data that looks like your production data is the ideal. One way is to do data profiling, develop an understanding of the ‘patterns’ and statistical trends in your data and then hand carve a set of test data that looks and smells like your production data but is totally fake and fraudulent and safe. Another approach is to take a copy of your production data and bugger around with it to mix names and addresses up, replace certain words in address data with different words (e.g. “Park” with “Grove” or “Leitrim” with “Carialmeg” or “@obriend.info” with “obriend.fakedatapeople” – whatever works). So long as the test data is representative of the structure and content of your production data set and can support the test scenarios you wish to perform then you are good to go.

    So, was the production data needed in New York – Nope. Would it be needed for testing in a test event for User Acceptance testing? Nope.

    And who does the ‘User Acceptance testing’? Here’s a hint… whats the first word? User Acceptance testing is done by representatives of the people who will be using the software. They usually follow test scripts to make sure that specific functionality is tested for, but importantly they can also highlight were things are just wrong.

    So, were there any IBTS ‘users’ (knowledge workers/clerical staff) in New York to support testing? We don’t know. But it sounds like the project was at the software development stage so it is unlikely. So why the heck was production data being used for development tasks?

    So… in conclusion
    The data was stolen in New York. It may or may not have been encrypted (the IBTS has assured the public that the data was encrypted on the laptop… perhaps I am cynical but someone who takes data from a client in another nation home for the weekend might possibly have decrypted the data to make life easier during development). We’re not clear (at this point) how the data got to New York – we’re assuming that an IBTS employee accompanied it to NY stored on physical media (the data, not the employee).

    However, there is no clear reason why PRODUCTION data needed to be in New York. Details of how the IBTS’s current data formats might map to the new system, details of requirements for changes to the NYBC’s current system to meet the needs of the IBTS, details of the data formats in the IBTS’s current data sets (both field structues and, ideally, a ‘profile’ of the structure of the data and any common errors that occur) and DUMMY data might be required for design, development and developer testing are all understandable. Production data is not.

    There is no evidence, other than the existence of a contractual arrangement, that the NYBC had sufficient safeguards in place to ensure the safety of personal data from Ireland. The fact that an NYBC employee decided to take the data out of the office into an unsecure environment (down town New York) and bring it home with them would evidence that, perhaps, there is a cultural and procedural gap in NYBC’s processes that might have meant they either couldn’t comply or didnt’ understand what the expectation of the clauses in those contracts actually meant.

    For testing, what is required is a model of production. A model. A fake. A facsimile NOT PRODUCTION. The more accurate your fake is the better. But it doesn’t need to be a carbon copy of your production data with exactly the same ‘data DNA’… indeed it can be a bad idea to test with ‘live’ data. Just like it is often dangerous to play with ‘live’ grenades or grab a ‘live’ power line to see what will happen.

    The loss of our IBTS data in New York evidences a failure of governance and a ‘happy path’ approach to risk planning, and a lack of appreciation of the governance and control of software development projects to ensure the protection of live data.

    As this was a project for the development of a software solution there was no compelling reason that I can identify for production data to have been sent from Ireland to New York when dummy data and project documentation would have sufficed.

    The press release from the IBTS about this incident can be found here..

    [UpdateSimon over at Tuppenceworth has noted my affiliation to the IAIDQ. Just to clarify, 99% of this post is about basic common sense. 1% is about Information Management/Information Quality Management. And as this post is appearing here and not on the IAIDQ’s website it goes without saying that my comments here may not match exactly the position of the IAIDQ on this issue. I’m also a member of the ICS, who offer a Data Protection certification course which I suspect will be quite heavily subscribed the next time it runs.]

    [Update 2: This evening RTE News interviewed Dr David Gray from DCU who is somewhat of an expert on IT security. The gist of Dr Gray’s comments were that software controls to encrypt data are all well and good, but you would have to question the wisdom of letting the information wander around a busy city and not having it under tight physical control… which is pretty much the gist of some of my comments below. No one has (as yet) asked why the hell production data rather than ‘dummy’ data was being used during the development phase of a project.]

  • Facebook & Data Protection

    The Younger McGarr (Simon that is) has a very detailed and well written post on the data protection issues that arise (and seemingly are ignored) by Facebook. It can be found over at the McGarr Solicitors website. He has already picked up some complimentary comments, including one from Thomas Otter (who has written on these issues previously). (Surely a reply from Robert Scoble is only a mouse-click away?)

    I’ve been scratching away on some notes for a post on Facebook myself (never one to miss a rolling bandwagon me). Expect more on this soon. (ie as soon as I’ve written the buggering thing).

  • What the…? – Irish Political coverage ignores the Elephant in the room

    I’m frankly baffled. We are in the run up to a General Election here in Ireland. All the media pundits are quoting 24th May as the date of the (as yet unannounced) election. This would require our parliament to be dissolved at the latest next week.

    Ireland runs a Proportional Representation/Single Transferable Vote system. It is built into our Consitution. There is a large body of legal opinion around the thresholds at which the ratio of elected representatives to number of people in a constituency breaches the Constitution. We are, it seems, at that point in 10 consituencies out of a total of 43. This has resulted in a Constitutional challenge in the High Court by two Independent TDs (Members of Parliament) to the holding of any election until the balance of Proportional Representation is restored through changes to the make up of Consituencies.

    The fact that key demographics had changed and there was a risk that the Electoral Constituency boundaries or numbers of representatives in each consituency might need to be altered was identified in September 2006 when the preliminary figures from our Census were published. There is no legal obligation on the Government to act or react to these however. The final Census figures were published on the 29th of March. These should be acted on or else there is the risk of any election being declared unconstitutional.

    The risk is that if the Dáil (our parliament) is dissolved prior to an election the running of which is declared unconstitutional until the parliament (the one that has been dissolved) addresses the issue of the Electoral contituencies then we could find ourselves with a bit of a governmental and Constitutional crisis.

    Yet the media continue to focus on the dog and pony show but ignore the Gorrilla in the room. The Executive arm of Government continues to barrel down the path to an election without any apparent appreciation of the risk that exists, both to the simple fact of an election and to the essence of our Constitution. Why has the existence of this Constitutional challenge not been publicised more? Why are the media giving the politicians sound-bite time to puff their agendas ahead of an election being called but they don’t ask the relevant politicians why we find ourselves at a juncture where the Constitutionality of our Electoral system is being challenged due a disproportionality in representation?

    The chronic lack of leadership and accountability on the part of the Government Minister charged with monitoring and managing how our Electoral Register and our Electoral Processes operate is shocking. However at least it is consistent with his lack of leadership and lack of willingness to be accountable for anything other than a soundbite on the news (he was going to ‘bash some heads together’ over the Galway water crisis apparently).

    To tie this back to my theme of Information Quality Management, Deming called on management to adopt a “constancy of purpose” and to wholeheartedly take on “the new philosophy” while breaking down barriers between people/organisations and driving out the fear that prevents the delivery of quality.

    Why does the relevant Minister seem to act in a manner that could only serve to drive in fear and increase the barriers that might exist that would prevent a good job being done? What is our incumbent Government’s purpose that they are constant to? What is the philosophy that they are pursuing?

    I’m off to Paddy Powers to place a bet that the Election won’t be called this side of June. Congratulations to Catherine Murphy and Finian McGrath for taking a stand on this issue.

  • Propogation of information errors and the risks of using surrogate sources

    ….ye wha’?

    There has been a lot written in relation to the electoral register and other matters about using information from other sources to improve the quality of information that you have or to create a new set of information.

    This makes sense, other people may already have done much of the work for you and, effectively, all you need to do is to copy their work and edit it to meet your needs. In most cases it may be faster and cheaper to use such ‘surrogates’ for reality to meet your information needs than to go to the effort of going to the real-world things (people, stock-rooms where ever) and actually starting from scratch to build exactly the information you need in the format you require to exactly your standards and formats.

    There is, however, a price to pay for having such surrogate sources available to you. You need to accept that

    1. The format and structure of the information may need to be changed to fit your systems or processes
    2. The information you are using may itself be innaccurate, incomplete or inconsistent.
    3. If you are combining it with other information, it will require investment in tools and skills to properly match and consolidate your information into a valid version of the truth.

    These risks apply to organisations buying marketing lists to integrate with their CRM systems but also could be applied to students relying on the Internet to present them with the content for their academic projects or journalists trawling for content for newspaper articles or reviews.

    Recurrence of common errors, phrases or inaccuracies in term papers is one way that academia has of identifying academic fraud. Similar techniques might be applied in other arenas to identify and track instances of copyright infringement.

    In businesses dealing with thousands of records, the cost/risk analysis is relatively straightforward. The recommendation I would make is that clear processes to manage suppliers and to measure the quality of the information they provide you based on a defined standard for completeness, consistency, duplication, conformity etc. is essential. Random sampling of surrogate data sources for accuracy (not every 100th record but a truly random sample) is also strongly recommended.

    These are EXACTLY the same techniques that manufacturing industries use to ensure the quality of the raw material inputs to their processes. If it works for industries where low quality can kill (such as pharmaceuticals), why shouldn’t work for you?

    For students, journalists and those of us hacking away in the blogosphere the recommendation is simple. Only rely on surrogate sources if you absolutely have to. If you use someone elses work as your source, credit them. If you don’t want to credit them then make sure you verify the accuracy of their work either by actually verifying against reality or by checking with at least one other source.

    That way you avoid having the errors of your source become your errors also and you don’t run the risk of someone crying foul and either suing you for stealing their copyright (and copyright does apply to content posted on the internet and in blogs) or taking whatever other sanctions might apply (such as kicking you off your college course).

    In many cases the costs and effort involved in double checking (particularly for a once of piece of writing) are neglibily different to the costs of actually starting from scratch and building your information up yourself. And, depending on the context, it may even be more enjoyable.

    The New York Times not so long ago had to relearn the lessons of checking stories with at least one other source for accuracy.

    Horatio Caine in CSI:Miami always tells his team to “trust, but verify”.

    When using surrogate sources for real-world information in any arena you must assess the risk of doing so and put in place the necessary controls so that you can trust that you have verified.

    (c) Daragh O Brien 2006 (just in case)