Why is it that people never learn? Only months after the debacle of HMRC sending millions of records of live confidential data whizzing around in the post on 2 CDs (or DVDs), the Irish Blood Transfusion Service (IBTS) has had 171,000 records of blood tests and blood donors stolen.
The data was on a laptop (bad enough from a security point of view). The data was (apparently) secured with 256bit AES encryption (happy days if true). The laptop was taken in a mugging (unfortunate). The mugging took place in New York (WTF!?!?)
Why was the data in New York?
It would seem that the IBTS had contracted with the New York Blood Centre (NYBC) for the customisation of some software that the NYBC had developed to better manage information on donors and blood test results. To that end the IBTS gave a copy of ‘live’ (or what we call in the trade ‘production’) data to the NYBC for them to use in developing the customisations.
So, personal data, which may contain ‘sensitive’ data relating to sexual activity, sexual behaviour, medicial conditions etc. was sent to the US. But it was encrypted, we are assured.
A quick look at the Safe Harbor list of the US Dept of Commerce reveals that the NYBC is not registered as being a ‘Safe Harbor’ for personal data from within the EU. Facebook is however (and we all know how compliant Facebook is with basic rules of data protection).
Apparently the IBTS relied on provisions of their contract with the NYBC to ensure and assure the security of the data relating to REAL people. As yet no information has come to light regarding whether any audits or checks were performed to ensure that those contractual terms were being complied with or were capable of being complied with.
How did the data get to New York?
From the IBTS press release it is clear that the data got to New York in a controlled manner.
An employee of NYBC took the disc back from Ireland and placed it in secure storage.
Which is a lot better than sticking two CDs in the post, like the UK Revenue services did not so long ago.
What about sending the data by email? Hmmm… nope, not secure enough and the file sizes might be to big. A direct point to point FTP between two servers? that would work as well, assuming that the FTP facilities were appropriately secured by firewalls and a healthy sense of paranoia.
Why was the data needed in New York?
According to the Irish Times
The records were in New York, the blood service said, “because we are upgrading the software that we use to analyse our data to provide a better service to donors, patients and the public service”.
Cool. So the data was needed in New York to let the developers make the necessary modifications to code.
Nice sound bite. Hangs together well. Sounds reasonable.
Unfortunately it is total nonsense.
For the developers to make modifications to an existing application, what was required in New York was
- A detailed specification of what the modifications needed to be to enable the software to function for Irish datasets and meet Irish requirements. Eg. if the name/address data capture screens needed to change they should have been specified in a document. If validation routines for zip cods/postcodes needed to be turned off, that should have been specified. If base data/reference data needed to be change - specify it in a document. Are we seeing a trend here?
- Definition of the data formats used in Ireland. by this I mean the definition of the formats of data such as “social security number”. We call it a PPSN and it has a format nnnnnnnA as opposed to the US format which has dashes in the middle. A definition of the data formats that would be used in Ireland and a mapping to/from the US formats would possibly be required… this is (wait for it) another document. NOT THE DATA ITSELF
- Some data for testing. Ok, so this is why all 171000+ records were on a laptop in New York. ehh… NO. What was required was a sample data set that replicates the formats and patterns of data found in the IBTS production data. This does not mean a cut of production data. What this means is that the IBTS should have created dummy data that was a replica of production data (warts and all – so if there are 10% of their records that have text values in fields where numbers would be expected, then 10% of the test data should reflect this). The test data should also be tied to specific test cases (experiments to prove or disprove functionality in the software).
At no time was production data needed for development or developer testing activities in New York. Clear project specification and requirements documentation, documents about data formatting and ‘meta-data’ (data about data), Use Cases (walk throughs of how the software would be used in a given process – like a movie script) and either a set of dummy sample data that looks and smells like you production data or a ‘recipe’ for how the developer can create that data.
But the production data would be needed for Acceptance testing by IBTS?
eh… nope. And even if it was it would not need to be sent to New York for the testing.
User Acceptance testing is a stage of testing in software development AFTER the developer swears blind that the software works as it should and BEFORE the knowledge workers in your organisation bitch loudly that the software is buggered up beyond all recognition.
As with all testing it does not require a the use of production data is not required, and indeed is often a VERY BAD IDEA (except in certain extreme circumstances such as the need for volume stress testing or testing of very complex software solutions that need data that is exactly like production to be tested effectively… eg. a complex parsing/matching/loading process on a multi-million record database – and even at that, key data not relevant to the specific process being tested ought to be ‘obscured’ to ensure data protection compliance ).
What is required is that your test environment is as close a copy to the reality you are testing for as possible. So, from a test data point of view, creating test data that looks like your production data is the ideal. One way is to do data profiling, develop an understanding of the ‘patterns’ and statistical trends in your data and then hand carve a set of test data that looks and smells like your production data but is totally fake and fraudulent and safe. Another approach is to take a copy of your production data and bugger around with it to mix names and addresses up, replace certain words in address data with different words (e.g. “Park” with “Grove” or “Leitrim” with “Carialmeg” or “@obriend.info” with “obriend.fakedatapeople” – whatever works). So long as the test data is representative of the structure and content of your production data set and can support the test scenarios you wish to perform then you are good to go.
So, was the production data needed in New York – Nope. Would it be needed for testing in a test event for User Acceptance testing? Nope.
And who does the ‘User Acceptance testing’? Here’s a hint… whats the first word? User Acceptance testing is done by representatives of the people who will be using the software. They usually follow test scripts to make sure that specific functionality is tested for, but importantly they can also highlight were things are just wrong.
So, were there any IBTS ‘users’ (knowledge workers/clerical staff) in New York to support testing? We don’t know. But it sounds like the project was at the software development stage so it is unlikely. So why the heck was production data being used for development tasks?
So… in conclusion
The data was stolen in New York. It may or may not have been encrypted (the IBTS has assured the public that the data was encrypted on the laptop… perhaps I am cynical but someone who takes data from a client in another nation home for the weekend might possibly have decrypted the data to make life easier during development). We’re not clear (at this point) how the data got to New York – we’re assuming that an IBTS employee accompanied it to NY stored on physical media (the data, not the employee).
However, there is no clear reason why PRODUCTION data needed to be in New York. Details of how the IBTS’s current data formats might map to the new system, details of requirements for changes to the NYBC’s current system to meet the needs of the IBTS, details of the data formats in the IBTS’s current data sets (both field structues and, ideally, a ‘profile’ of the structure of the data and any common errors that occur) and DUMMY data might be required for design, development and developer testing are all understandable. Production data is not.
There is no evidence, other than the existence of a contractual arrangement, that the NYBC had sufficient safeguards in place to ensure the safety of personal data from Ireland. The fact that an NYBC employee decided to take the data out of the office into an unsecure environment (down town New York) and bring it home with them would evidence that, perhaps, there is a cultural and procedural gap in NYBC’s processes that might have meant they either couldn’t comply or didnt’ understand what the expectation of the clauses in those contracts actually meant.
For testing, what is required is a model of production. A model. A fake. A facsimile NOT PRODUCTION. The more accurate your fake is the better. But it doesn’t need to be a carbon copy of your production data with exactly the same ‘data DNA’… indeed it can be a bad idea to test with ‘live’ data. Just like it is often dangerous to play with ‘live’ grenades or grab a ‘live’ power line to see what will happen.
The loss of our IBTS data in New York evidences a failure of governance and a ‘happy path’ approach to risk planning, and a lack of appreciation of the governance and control of software development projects to ensure the protection of live data.
As this was a project for the development of a software solution there was no compelling reason that I can identify for production data to have been sent from Ireland to New York when dummy data and project documentation would have sufficed.
The press release from the IBTS about this incident can be found here..
[Update - Simon over at Tuppenceworth has noted my affiliation to the IAIDQ. Just to clarify, 99% of this post is about basic common sense. 1% is about Information Management/Information Quality Management. And as this post is appearing here and not on the IAIDQ's website it goes without saying that my comments here may not match exactly the position of the IAIDQ on this issue. I'm also a member of the ICS, who offer a Data Protection certification course which I suspect will be quite heavily subscribed the next time it runs.]
[Update 2: This evening RTE News interviewed Dr David Gray from DCU who is somewhat of an expert on IT security. The gist of Dr Gray's comments were that software controls to encrypt data are all well and good, but you would have to question the wisdom of letting the information wander around a busy city and not having it under tight physical control... which is pretty much the gist of some of my comments below. No one has (as yet) asked why the hell production data rather than 'dummy' data was being used during the development phase of a project.]