Some days I wake up feeling like Lt. Columbo. I bound out of bed assured in myself that, throughout the day I’ll be niggled by, or rather niggle others with, ‘just one more question’.
Today was not one of those days. But you’d be surprised what can happen while going about the morning ablutions. “Over 171000 (174618 in total) records sent to New York. Sheesh. That’s a lot. Particularly for a sub-set of the database reflecting records that were updated between 2nd July 2007 and 11th October 2007. That’s a lot of people giving blood or having blood tests, particularly during a short period. The statistics for blood donation in Ireland must be phenomenal. I’m surprised we can drag our anaemic carcasses from the leaba and do anything; thank god for steak sandwiches, breakfast rolls and pints of Guinness!”, I hummed to myself as I scrubbed the dentation and hacked the night’s stubble off the otherwise babysoft and unblemished chin (apologies – read Twenty Major’s book from cover to cover yesterday and the rich prose rubbed off on me).
“I wonder where I’d get some stats for blood donation in Ireland. If only there was some form of Service or agency that managed these things. Oh.. hang on…, what’s that Internet? Silly me.”
So I took a look at the IBTS annual report for 2006 to see if there was any evidence of back slapping and awards for our doubtlessly Olympian donation efforts.
According to the the IBTS, “Only 4% of our population are regular donors” (source: Chairperson’s statement on page 3 of the report). Assuming the population in 2006 (pre census data publication) was around 4.5 million (including children), this would suggest a maximum regular donor pool of 180,000. If we take the CSO data breaking out population by age, and make a crude guess on the % of 15-24 year olds that are over 18 (we’ll assume 60%) then the pool shrinks further… to around 3.1 million, giving a regular donor pool of 124000 approx.
Hmm… that’s less than the number of records sent as test data to New York based on a sub-set of the database. But my estimations could be wrong.
The IBTS Annual Report for 2006 tells us (on page 13) that
The average age of the donors who gave blood
in 2006 was 38 years and 43,678 or 46% of our
donors were between the ages of 18 and 35
years.
OK. So let’s stop piddling around with assumptions based on the 4% of population hypothesis. Here’s a simpler sum to work out… If X = 46% of Y, calculate Y.
(43678/46)X100 = 94952 people giving blood in total in 2006. Oh. That’s even less than the other number. And that’s for a full year. Not a sample date range. That is <56% of the figure quoted by the IBTS. Of course, this may be the number of unique people donating rather than a count of individual instances of donation… if people donated more than once the figure could be higher.
The explanation may also lie with the fact that transaction data was included in the extract given to the NYBC (and record of a donation could be a transaction). As a result there may be more than one row of data for each person who had their data sent to New York (unless in 2007 there was a magical doubling of the numbers of people giving blood).
According to the IBTS press release:
The transaction files are generated when any modification is made to any record in Progesa and the relevant period was 2nd July 2007 to 11th October 2007 when 171,324 donor records and 3,294 patient blood group records were updated.
(the emphasis is mine).
The key element of that sentence is “any modification is made to any record”. Any change. At all. So, the question I would pose now is what modifications are made to records in Progresa? Are, for example, records of SMS messages sent to the donor pool kept associated with donor records? Are, for example, records of mailings sent to donors kept associated? Is an audit trail of changes to personal data kept? If so, why and for how long? (Data can only be kept for as long as it is needed). Who has access rights to modify records in the Progresa system? Does any access of personal data create a log record? I know that the act of donating blood is not the primary trigger here… apart from anything else, the numbers just don’t add up.
It would also suggest that the data was sent in a ‘flat file’ structure with personal data repeated in the file for each row of transaction data.
How many distinct person records were sent to NYBC in New York? Was it
- A defined subset of the donors on the Progresa system who have been ‘double counted in the headlines due to transaction records being included in the file? ….or
- All donors?
- Something in between?
If the IBTS can’t answer that, perhaps they might be able to provide information on the average number of transactions logged per unique identified person in their database during the period July to October 2007?
Of course, this brings the question arc back to the simplest question of all… while production transaction records might have been required, why were ‘live’ personal details required for this software development project and why was anonymised or ‘defused’ personal data not used?
To conclude…
Poor quality information may have leaked out of the IBTS as regards the total numbers of people affected by this data breach. The volume of records they claim to have sent cannot (at least by me) be reconciled with the statistics for blood donations. They are not even close.
The happy path news here is that the total number of people could be a lot less. If we assume ‘double dipping’ as a result of more than one modification of a donor record, then the worst case scenario is that almost their entire ‘active’ donor list has been lost. The best case scenario is that a subset of that list has gone walkies. It really does boil down to how many rows of transaction information were included alongside each personal record.
However, it is clear that, despite how it may have been spun in the media, the persons affected by this are NOT necessarily confined to the pool of people who may have donated blood or had blood tests peformed between July 2007 and October 2007. Any modification to data about you in the Progresa System would have created a transaction record. We have no information on what these modifications might entail or how many modifications might have occured, on average, per person during that period.
In that context the maximum pool of people potentially affected becomes anyone who has given blood or had blood tests and might have a record on the Progressa system.
That is the crappy path scenario.
Reality is probably somewhere in between.
But, in the final analysis, it should be clear that real personal data should never have been used and providing such data to NYBC was most likely in breach of the IBTS’s own data protection policy.