June 27, 2008

How not to handle a customer…

So, I’ve been having problems with my broadband. Problems significant enough that I would suggest that the Dept of Comms actually think through the potential reliance on Fixed Wireless solutions for Ireland’s broadband deficit. More on that another time.

What annoys me in the immediate sense is the level of customer service that people seem to think is OK. I had my FWA antenna removed from my house today. I found out about it when I looked out the window and saw the van from my provider in the drive way and the legs of a ladder going up the side of the house. I expected a binglybong on the door bell to let me know what was happening, but nowt. I was working so I couldn’t rush out to talk to the man. By the time I’d finished the work stuff he’d vanned away again.

I’d complained to my provider in writing back in May about some issues. I got a nice email addressing part of my complaint and bugger all else. After this morning’s visitation I emailed them to find out what was going on.

Apparently they’ve tried to contact me “numberous” [sic] times over the past month to talk to me about the problems I was having.

Checked email… nowt.
Checked spam filter… nowt.
Checked missed calls on phone… nowt.
Checked the drawer in the kitchen where all the things that look like bills get hidden… nowt.

I know I had no voicemails from them on the phone as I would have remembered it (and I would have downloaded the voicemail from the webmail service provided by my mobile service provider -betcha didn’t know you could do that did you, unified messaging almost - and put it in the folder of documents/evidence I am compiling to go with my inevitable ComReg complaint).

Apparently the only contact information they have for me is my mobile number. Apart from the fact they’ve sent me emails to my email address and a man-in-a-van could find my house, where letters also go. And I included all of that information again on my complaint letter.

So the lack of a follow up email, or a letter responding to my complaint or a friendly binglybong on the doorbell from the man in the van to fill me in on things were all beyond them, because they didn’t have the information. Which they, errmmmm, had, for the reasons mentioned above.

So that thing about only having a mobile number to contact me is a… [mistake] [lie] [cop out] [failure of internal processes to properly manage customer information]… (select one or more options as appropriate).

It would seem it’s all my fault I didn’t know what was going on. I should have felt the disturbance in The Force, as if a small call centre of people suddenly cried out as one and then suddenly felll silent. Curse my failing and fading Jedi skills.

At least that’s how I’d feel if I wasn’t so peeved at the whole thing. I think that once I’ve updated ComReg with the nonsense I’m dealing with I’ll send my ex-provider a request for all personal information they hold about me (electronic and paper file, and ip and traffic logs etc. ) under the terms of the Data Protection Act. ‘Coz I am fond of my regulatory frameworks and codes of practice etc.

Notice that I’ve not named the service provider or discussed the specific issues here. That would be unfair to my (it would seem former - at their initiative) Broadband Provider. However, they are exactly the type of organisation that DCENR seems to be pinning the Great Broadband Hope on.

The good news is that the Vodafone broadband dongle I have for using while commuting and which has been my main tool for getting on line at home recently - even though it is just 2G around these parts, picked up a 3 3G network last night. Couldn’t connect to it but knew it was there. So that’s got me thinking….

June 10, 2008

An IQ Trainwreck…

From Don Carlson, one of my IAIDQ cronies in the US comes this YouTube vid from Informatica (a data quality software tool vendor) that sums up a lot of why Information Quality matters.

Of course, I could get snooty and ask what gave them the idea to juxtapose Information Quality and Trainwrecks…. gosh, I’d swear I’ve seen that somewhere before

May 21, 2008

The Electoral Register Hokey-Cokey

When I was a small child, my grandmother used to entertain me and my siblings by getting us to sing and dance the hokey cokey, a playful little song and dance routine if ever there was one.

This dance was brought to mind yesterday when Fergal of the Tuppenceworth bloggers emailed me to let me know that he appears to have been taken off the Electoral Register in his home county. Again.

You put your right to self-determination and election of a government by proportional representation as mandated by the constititution of the Irish Republic in.
You put your right to self-determination and election of a government by proportional representation as mandated by the constititution of the Irish Republic out.
In. Out. In. Out.
And you shake it all about.

It would seem that Fergal had been taken off the Register during the Great Clean up of 2006. He then had his ballot reinstated. The other day, in a fit of electoral existentialism he decided to try and find himself on the Electoral Register website www.checktheregister.ie

Zen like, he found himself encountering the concept of nothing as a search for his name at his address revealed nothing. Oh Hokey Cokey Cokey indeed.

So what may have gone wrong here?

  • Is Fergal’s name transposed on the Register (surname first, firstname last)?
  • Is the address registered against Fergal on the Register different to his address?
  • Does the search function on the Electoral Register require an exact character match on names/addresses? Is “Fergal” interpreted as a different name to “Fearghal” (both Fergals in my book)?
  • If Fergal has indeed been deleted from the Register (again), what triggered the Hokey Cokey here? Was an old copy of the Register loaded to the website?
  • Is the version you search on-line up to date with the version you might find in your library or Garda Station? Might Fergal be on the Register, but just not on the Register that is searched? Might it work in the contrary… Might people be listed as ‘on the register’ in an on-line search but be off the Register in the ‘paper’ world (ie the version that counts on polling day)?

The list of potential root causes is (especially as I am speculating a bit) quite long. However this is further evidence that the processes for the management of the Electoral Register are a bit knackered. This has been accepted by the Government and the Oireachtas Committee on the Electoral Register recently published a series of recommendations which eerily echoed comments and recommendations made on this blog over 2 years ago.

However, while there is an urgent need to have as accurate an electoral register as possible (1 Referendum in our immediate future and Local Elections in the not to distant future), care must be taken to ensure we solve the problems of tomorrow as well as the problems of today.

But in the words of Tom Jones - “I think I’m gonna dance now”…

“Oh, hokey cokey cokey…. Oh hokey cokey cokey…..”

May 15, 2008

Telephone numbers and Information Quality - the risk of assumption

There is an old saying that the word “Assume” makes an “Ass” out of “You” and “Me”.

Yet we see (and make) assumptions every day when it comes to assessing the quality (or otherwise) of information. Anglo-Saxon biassed peoples (US, English speaking Europe etc) often assume that names are structured Firstname Surname. “Daragh” = First Name, “O Brien” = Surname. The cultural bias here is well documented by people like Graham Rhind (who advises the use of “Given Name/Family Name” constructs on web forms etc. to improve cross-cultural usability.

But what if you see “George Michael” written down (without the context of labels for each name part) with a reference to “singer”? Would this relate to the pop singer George Michael, or the bass baritone singer Michael George?

One of the common ‘rules of thumb’ with telephone numbers is that, when you are trying to create the full ‘internationalised’ version of a telephone number (+[international access code] [local area code] [local number]) you take the number as written ‘locally’ and drop the leading zero. Of course, like most conventional wisdom a little scrutiny causes this rule of thumb to fall apart.

For example, in the Czech Republic there is no ‘leading zero’ as it is actually part of the international access code (which actually makes more sense to me…). One might assume that Europe, with the standardisation ethos of the European Union would all have plumped for “0″ as a leading digit on local area codes. Not so, as Portugal doesn’t use any leading digit on their area codes. Some countries that used to be part of the USSR (like Russia, Belarus and Azerbijan) use 8 instead of 0.

You might not be safe in assuming that you just need to consider the first digit of the local area code. Hungary has a 2-digit prefix (06), so you would need to parse in 2 characters in the string to remove the correct digits. Just stripping the leading zero will result in a totally embuggered piece of information.

Also, everyone assumes that a telephone number will consist only of numbers. However, there are a few instances where the code required to dial out from a country (the International Direct Dial code) is actually alphanumeric in that it contains either the * (star) or # (hash key/pound key). Our buddies in Belarus are an example of this, where to dial out from Belarus you need to dial “8**10″ (which even more confusingly is often written “8~10″.
So what does this mean for people who are assessing or seeking to improve the quality of telephone number data in their systems?

Well, first off it means you need to have some context to understand the correct business rules to apply. For example, the rules I would apply to assessing the quality (and likely defects) in a telephone number from Ireland would be different to what I’d need to apply to telephone numbers relating to Belarus. In an Irish telephone number it would be correct to strip out instances of “**” and then validate the rest of the string based on its length (if stripping the ** made it too short to be a telephone number then we would need to tag it as duff data and remove it). With data relating to Belarus it might simply be that the person filling in the form (the source of the data) got confused about what codes to use.

Secondly, it means you need to put some thought into the design of information capture processes to reduce the chances of errors occuring. Defining a structure with seperate fields, linking the international access code to a country drop down (and a library of business rules for how to interpret and ’standardize’ subsequent inputs) would not be too difficult - it would just require investment of effort in researching the rules and maintaining them once deployed. Here’s a link to a useful resource I’ve found (note that I can’t vouch for the frequency of updates to this site, but I’ve found it a fun way to figure out what the rules might be for various countries). Also, Wikipedia has a good piece on Telephone number plans. Graham Rhind also has some good links to references for telephone number format rules
Looking at the data of a telephone number in isolation will most likely result in you screwing up some of the data (if you have international telephone number). Having the country information for that data (is the number in France or Belarus) allows you to construct appropriate rules and make your assumptions in the appropriate context to reduce your risks of error.

Ultimately, blundering in with a crude rule of thumb and simply stripping any leading zeros you find because that is the assumption you’ve made will result in you making an ass out of you and your data.

Which raises an interesting question…

Imagine you have been given a spreadsheet of telephone numbers that you have been told are international numbers in the ‘local’ formats for the respective countries. You open the spreadsheet and there are no leading zeros (because Excel -and most other spreadsheets- assumes that numbers don’t begin with zero and strip it out). What to you do to get the data back to a format that you can actually use?

Answers on a post card (or in the comments) please.

April 30, 2008

Cripes, the blog has been name-checked by my publisher…

TwentyMajor isn’t the only blogger in the pay of a publisher (I’m conveniently ignoring Grandad and the others as Irish bloggers are too darned fond of publishing these days. If you want to know who all the Irish bloggers with publishers are then Damien Mulley probably has a list)!

I recently wrote an industry report for a UK publisher on Information Quality strategy. The publisher then swapped all my references to Information Quality to references to Data Quality as that was their ‘brand’ on the publication. I prefer the term Information Quality for a variety of reasons.

As this runs to over 100 pages of A4 it has a lot of words in it. My fingers were tired after typing it. Unlike Twenty’s book, I’ve got pictures in mine (not those kind of pictures, unfortunately, but nice diagrams of concepts related to strategy and Information Quality. If you want the other kind of pictures, you’ll need to go here.)

In the marketing blurb and bumph that I put together for the publisher I mentioned this blog and the IQTrainwrecks.com blog. Imagine my surprise when I opened a sales email from the publisher today (yes, they included me on the sales mailing list… the irony is not lost on me… information quality, author, not likely to buy my own report when I’ve got the four drafts of it on the lappytop here).

So, for the next few weeks I’ll have to look all serious and proper in a ‘knowing what I’m talking about’ kind of way to encourage people to by my report. (I had toyed with some variation on booky-wook but it just doesn’t work - reporty-wort… no thanks, I don’t want warts).

So things I’ll have to refrain from doing include:

  1. Engaging in pointless satirical attacks on the government or businesses just for a laugh, unless I can find an Information Quality angle
  2. Talking too loudly about politics
  3. Giving out about rural/urban digital divides in Ireland
  4. Parsing and reformatting the arguments of leading Irish opinion writers to expose the absence of logic or argument therein.
  5. Engaging in socio-economic analysis of the fate of highstreet purveyors of dirty water parading as coffee.
  6. Swearing

That last one is a f***ing pain in the a**.

If any of you are interested in buying my ‘umble little report, it is available for sale from Ark Group via this link.. . This link will make them think you got the email they sent to me, and you can get a discount, getting the yoke for £202.50 including postage and packing (normally £345+£7.50p&p. (Or click here to avoid the email campaign software…)

And if any of you would like to see the content that I’d have preferred the link in the sales person’s to send you to (coz it highlights the need for good quality management of your information quality) then just click away here to go to IQTrainwrecks.com

Thanks to Larry, Tom, Danette, the wifey for their support while I was writing the report and Stephanie and Vanessa at Ark Group for their encouragement to get it finished by the deadline.

March 31, 2008

The Electoral Register (Here we go again)

The Irish Times today carries a story on page five which details a number of proposed changes to the management of the Electoral Register arising from the kerfuffle of the past two years about how totally buggered it is. For those of you who don’t know, I’ve written a little bit about this in the past (earning an Obsessive Blogger badge in the process donchaknow). It was just under two years ago that I opened this blog with a post on this very topic…

A number of points raised in the article interest me, if for no other reason than they sound very familiar - more on that anon. Other interest me because they still run somewhat counter to the approach that is needed to finally resolve the issue.

I’ll start with the bits that run counter to the approach required. The Oireachtas Committee has been pretty much consistent in its application of the boot to Local Authorities as regards the priority they give to the management of the Electoral Register. According to the Irish Times article, the TDs and Senators found that:

“Running elections is not a core function of local authorities. Indeed, it is not a function that appears to demand attention every year. It can, therefore, be questioned if it gets the priority it warrants under the array of authorities”

I must humbly agree and disagree with this statement. By appearing to blame Local Authorities for the problem and for failing to prioritise the management of the Electoral Register, the Committee effectively absolves successive Ministers for the Environment and other elected officials from failing to ensure that this ‘information asset’ was properly maintained. Ultimately, all Local Authorities fall under the remit of the Minister for Environment, Heritage and Local Government. As the ’supreme being’ in that particular food chain, the Minister (and their department) is in a position to set policy, establish priorities and mandate adequate resourcing of any Local Authority function, from Water Services to Electoral Franchise.

The key issue is that Franchise section was not seen as important by anyone. A key information asset was not managed, no continual plans were put in place for the acquisition of information or the maintenance of information. Only when there were problems applying the information did anyone give a darn. This, unfortunately, is a problem that is not confined to Local Government and Electoral data however - a large number of companies world wide have felt the pain of failing to manage the quality of their information assets in recent times.

Failing to acknowledge that the lack of management priority was systemic and endemic within the entire hierarchy of Central and Local Government means that a group of people who probably tried to do their best with the resources assigned to them are probably going to feel very aggrieved. “The Register is buggered. It’s your fault. We’re taking it away from you” is the current message. Rather it should be “The system we were operating is broken. Collectively there was a failure to prioritise the management of this resource. The people tried to make it work, but best efforts were never enough. It needs to be replaced.”

W. Edward’s Deming advised people seeking to improve quality to ‘drive out fear’. A corollary of that is that one should not engage in blame when a system is broken unless you are willing to blame all actors in the system equally.

However, I’m equally guilty as I raised this issue (albeit not in as ‘blaming’ a tone) back in… oh 2006.:

Does the current structure of Local Authorities managing Electoral Register data without a clear central authority with control/co-ordination functions (such as to build the national ‘master’ file) have any contribution to the overstatement of the Register?

Moving on to other points that sound very familiar…

  1. Errors are due to a “wide variety of practices” within Local Authorities. Yup, I recall writing about that as a possible root cause back in 2006. Here and here and here and here and here in fact.
  2. The use of other data sources to supplement the information available to maintain the Register is one suggestion. Hmmm… does this sound like it covers the issue?
  3. Could the Electoral Register process make use of a data source of people who are moving house (such as An Posts’s mail redirection service or newaddress.ie)? How can that be utilised in an enhanced process to manage & maintain the electoral register? These are technically surrogate sources of reality rather than being ‘reality’ itself, but they might be useful.

    That’s from a post I wrote here on the 24th April 2006.

    And then there’s this report, which was sent to Eamon Gilmore on my behalf and which ultimately found its way to Dick Roche’s desk while he was still the Minister in the DOELG. Pages 3 to 5 make interesting reading in light of the current proposals. Please note the negatives that I identified with the use of data from 3rd party organisations that would need to be overcome for the solution to be entirely practicable. These can be worked around with sound governance and planning, but bumbling into a solution without understanding the potential problems that would need to be addressed will lead to a less than successful implementation.

  4. The big proposal is the creation of a ‘central authority’ to manage the Electoral Register. This is not new. It is simply a variation on a theme put forward by Eamon Gilmore in a Private Member’s Bill which was debated back in 2006 and defeated at the Second Stage(The Electoral Registration Commissioner Bill, 2005). This is a proposal that I also critiqued in the report that wound its way to Dick Roche… see pages 3 to 5 again. I also raise issues of management and management culture at page 11.
  5. The use of PPS numbers is being considered but there are implications around Data Protection . Hmm… let’s see… I mentioned those issues in this post and in this post.
  6. And it further assumes that the PPS Identity is always accurate (it may not be, particularly if someone is moving house or has moved house. I know of one case where someone was receiving their Tax Certs at the address they lived in in Dublin but when they went to claim something, all the paperwork was sent to their family’s home address down the country where they hadn’t lived for nearly 15 years.)

    In my report in 2006 (and on this blog) I also discussed the PPS Number and the potential for fraud if not linked to some form of photographic ID given the nature of documents that a PPS number can be printed on in the report linked to above. This exact point was referenced by Senator Camillus Glynn at a meeting of the Committee last week

    “I would not have a difficulty with using the PPS card. It is logical, makes sense and is consistent with what obtains in the North. The PPS card should also include photographic evidence. I could get hold of Deputy Scanlon’s card. Who is to say that I am not the Deputy if his photograph is not on the card? Whatever we do must be as foolproof as possible.”

    This comment was supported by a number of other committee members.

So, where does that leave us? Just under two years since I started obsessively blogging about this issue, we’ve moved not much further than when I started. There is a lot of familiarity about the sound-bites coming out at present - to put it another way, there is little on the table at the moment (it seems) that was not contained in the report I prepared or on this blog back in 2006.

What is new? Well, for a start they aren’t going to make Voter Registration compulsory. Back in 2006 I debated this briefly with Damien Blake… as I recall Damien had proposed automatic registration based on PPS number and date of birth. I questioned whether that would be possible without legislative changes or if it was even desirable. However, the clarification that mandatory registration is now off the table is new.

The proposal for a centralised governance agency and the removal of responsibility for Franchise /Electoral Register information from the Local Authorities sounds new. But it’s not. It’s a variation on a theme that simply addresses the criticism I had of the original Labour Party proposal. By creating a single agency the issues of Accountability/Responsibility and Governance are greatly simplified, as are issues of standardisation of forms and processes and information systems.

One new thing is the notion that people should be able to update their details year round, not just in a narrow window in November. This is a small but significant change in process and protocol that addresses a likely root cause.

What is also new - to an extent - is the clear proposal that this National Electoral Office should be managed by a single head (one leader), answerable to the Dail and outside the normal Civil Service structures (enabling them to hire their own staff to meet their needs). This is important as it sets out a clear governance and accountability structure (which I’d emphasised was needed - Labour’s initial proposal was for a Quango to work in tandem with Local Authorities… a recipe for ‘too many cooks’ if ever I’d heard one). That this head should have the same tenure as a judge to “promote independence from government” is also important, not just because of the independence and allegiance issues it gets around, but also because it sends a very clear message.

The Electoral Register is an important Information Asset and needs to be managed as such. It is not a ‘clerical’ function that can be left to the side when other tasks need to be performed. It is serious work for serious people with serious consequences when it goes wrong.

Putting its management on a totally independent footing with clear accountability to the Oireachtas and the Electorate rather than in an under-resourced and undervalued section within one of 34 Local Authorities assures an adequate consistency of Governance and a Constancy of Purpose. The risk is that unless this agency is properly funded and resourced it will become a ‘quality department’ function that is all talk and no trousers and will fail to achieve its objectives.

As much of the proposals seem to be based on (or eerily parallel) analysis and recommendations I was formulating back in 2006, I humbly put myself forward for the position of Head of the National Elections Office ;-)

February 27, 2008

Final post and update on IBTS issues

OK. This is (hopefully) my final post on the IBTS issues. I may post their response to my queries about why I received a letter and why my data was in New York. I may not. So here we go..

First off, courtesy of a source who enquired about the investigation, the Data Protection Commissioner has finished their investigation and the IBTS seems to have done everything as correct as they could, in the eyes of the DPC with regard to managing risk and tending to the security of the data. The issue of why the data was not anonymised seems to be dealt with on the grounds that the fields with personal data could not be isolated in the log files. The DPC finding was that the data provided was not excessive in the circumstances.

[Update: Here's a link to the Data Protection Commissioner's report. ]

This suggests to me that the log files effectively amounted to long strings of text which would have needed to be parsed to extract given name/family name/telephone number/address details, or else the fields in the log tables are named strangely and unintuitively (not as uncommon as you might think) and the IBTS does not have a mapping of the fields to the data that they contain.

In either case, parsing software is not that expensive (in the grand scheme of things) and a wide array of data quality tools provide very powerful parsing capabilities at moderate costs. I think of Informatica’s Data Quality Workbench (a product originally developed in Ireland), Trillium Software’s offerings or the nice tools from Datanomic.

Many of these tools (or others from similar vendors) can also help identify the type of data in fields so that organisations can identify what information they have where in their systems. “Ah, field x_system_operator_label actually has names in it!… now what?”.

If the log files effectively contained totally unintelligible data, one would need to ask what the value of it for testing would be, unless the project involved the parsing of this data in some way to make it ‘useable’? As such, one must assume that there was some inherent structure/pattern to the data that information quality tools would be able to interpret.

Given that according to the DPC the NYBC were selected after a public tender process to provide a data extraction tool this would suggest that there was some structure to the data that could be interpreted. It also (for me) raises the question as to whether any data had been extracted in a structured format from the log files?

Also the “the data is secure because we couldn’t figure out where it was in the file so no-one else will” defence is not the strongest plank to stand on. Using any of the tools described above (or similar ones that exist in the open source space, or can be assembled from tools such as Python or TCL/TK or put together in JAVA) it would be possible to parse out key data from a string of text without a lot of ‘technical’ expertise (Ok, if you are ‘home rolling’ a solution using TCL or Python you’d need to be up to speed on techie things, but not that much). Some context data might be needed (such as a list of possible firstnames and a list of lastnames, but that type of data is relatively easy to put together. Of course, it would need to be considered worth the effort and the laptop itself was probably worth more than irish data would be to a NYC criminal.

The response from the DPC that I’ve seen doesn’t address the question of whether NYBC failed to act in a manner consistent with their duty of care by letting the data out of a controlled environment (it looks like there was a near blind reliance on the security of the encryption). However, that is more a fault of the NYBC than the IBTS… I suspect more attention will be paid to physical control of data issues in future. While the EU model contract arrangements regarding encryption are all well and good, sometimes it serves to exceed the minimum standards set.

The other part of this post relates to the letter template that Fitz kindly offered to put together for visitors here. Fitz lives over at http://tugofwar.spaces.live.com if anyone is interested. I’ve gussied up the text he posted elsewhere on this site into a word doc for download ==> Template Letter.

Fitz invites people to take this letter as a starting point and edit it as they see fit. My suggestion is to edit it to reflect an accurate statement of your situation. For example… if you haven’t received a letter from the IBTS then just jump to the end and request a copy of your personal data from the IBTS (it will cost you a few quid to get it), if you haven’t phoned their help-line don’t mention it in the letter etc…. keep it real to you rather than looking like a totally formulaic letter.

On a lighter note, a friend of mine has received multiple letters from the Road Safety Authority telling him he’s missed his driving test and will now forfeit his fee. Thing is, he passed his test three years ago. Which begs the question (apart from the question of why they are sending him letters now)… why the RSA still has his application details given that data should only be retained for as long as it is required for the stated purpose for which it was collected? And why have the RSA failed to maintain the information accurately (it is wrong in at least one significant way).

IBTS… returning to the scene of the crime

Some days I wake up feeling like Lt. Columbo. I bound out of bed assured in myself that, throughout the day I’ll be niggled by, or rather niggle others with, ‘just one more question’.

Today was not one of those days. But you’d be surprised what can happen while going about the morning ablutions. “Over 171000 (174618 in total) records sent to New York. Sheesh. That’s a lot. Particularly for a sub-set of the database reflecting records that were updated between 2nd July 2007 and 11th October 2007. That’s a lot of people giving blood or having blood tests, particularly during a short period. The statistics for blood donation in Ireland must be phenomenal. I’m surprised we can drag our anaemic carcasses from the leaba and do anything; thank god for steak sandwiches, breakfast rolls and pints of Guinness!”, I hummed to myself as I scrubbed the dentation and hacked the night’s stubble off the otherwise babysoft and unblemished chin (apologies - read Twenty Major’s book from cover to cover yesterday and the rich prose rubbed off on me).

“I wonder where I’d get some stats for blood donation in Ireland. If only there was some form of Service or agency that managed these things. Oh.. hang on…, what’s that Internet? Silly me.”

So I took a look at the IBTS annual report for 2006 to see if there was any evidence of back slapping and awards for our doubtlessly Olympian donation efforts.

According to the the IBTS, “Only 4% of our population are regular donors” (source: Chairperson’s statement on page 3 of the report). Assuming the population in 2006 (pre census data publication) was around 4.5 million (including children), this would suggest a maximum regular donor pool of 180,000. If we take the CSO data breaking out population by age, and make a crude guess on the % of 15-24 year olds that are over 18 (we’ll assume 60%) then the pool shrinks further… to around 3.1 million, giving a regular donor pool of 124000 approx.

Hmm… that’s less than the number of records sent as test data to New York based on a sub-set of the database. But my estimations could be wrong.

The IBTS Annual Report for 2006 tells us (on page 13) that

The average age of the donors who gave blood
in 2006 was 38 years and 43,678 or 46% of our
donors were between the ages of 18 and 35
years.

OK. So let’s stop piddling around with assumptions based on the 4% of population hypothesis. Here’s a simpler sum to work out… If X = 46% of Y, calculate Y.

(43678/46)X100 = 94952 people giving blood in total in 2006. Oh. That’s even less than the other number. And that’s for a full year. Not a sample date range. That is <56% of the figure quoted by the IBTS. Of course, this may be the number of unique people donating rather than a count of individual instances of donation… if people donated more than once the figure could be higher.

The explanation may also lie with the fact that transaction data was included in the extract given to the NYBC (and record of a donation could be a transaction). As a result there may be more than one row of data for each person who had their data sent to New York (unless in 2007 there was a magical doubling of the numbers of people giving blood).

According to the IBTS press release:

The transaction files are generated when any modification is made to any record in Progesa and the relevant period was 2nd July 2007 to 11th October 2007 when 171,324 donor records and 3,294 patient blood group records were updated.

(the emphasis is mine).

The key element of that sentence is “any modification is made to any record”. Any change. At all. So, the question I would pose now is what modifications are made to records in Progresa? Are, for example, records of SMS messages sent to the donor pool kept associated with donor records? Are, for example, records of mailings sent to donors kept associated? Is an audit trail of changes to personal data kept? If so, why and for how long? (Data can only be kept for as long as it is needed). Who has access rights to modify records in the Progresa system? Does any access of personal data create a log record? I know that the act of donating blood is not the primary trigger here… apart from anything else, the numbers just don’t add up.

It would also suggest that the data was sent in a ‘flat file’ structure with personal data repeated in the file for each row of transaction data.

How many distinct person records were sent to NYBC in New York? Was it

  • A defined subset of the donors on the Progresa system who have been ‘double counted in the headlines due to transaction records being included in the file? ….or
  • All donors?
  • Something in between?

If the IBTS can’t answer that, perhaps they might be able to provide information on the average number of transactions logged per unique identified person in their database during the period July to October 2007?

Of course, this brings the question arc back to the simplest question of all… while production transaction records might have been required, why were ‘live’ personal details required for this software development project and why was anonymised or ‘defused’ personal data not used?

To conclude…
Poor quality information may have leaked out of the IBTS as regards the total numbers of people affected by this data breach. The volume of records they claim to have sent cannot (at least by me) be reconciled with the statistics for blood donations. They are not even close.

The happy path news here is that the total number of people could be a lot less. If we assume ‘double dipping’ as a result of more than one modification of a donor record, then the worst case scenario is that almost their entire ‘active’ donor list has been lost. The best case scenario is that a subset of that list has gone walkies. It really does boil down to how many rows of transaction information were included alongside each personal record.

However, it is clear that, despite how it may have been spun in the media, the persons affected by this are NOT necessarily confined to the pool of people who may have donated blood or had blood tests peformed between July 2007 and October 2007. Any modification to data about you in the Progresa System would have created a transaction record. We have no information on what these modifications might entail or how many modifications might have occured, on average, per person during that period.

In that context the maximum pool of people potentially affected becomes anyone who has given blood or had blood tests and might have a record on the Progressa system.

That is the crappy path scenario.

Reality is probably somewhere in between.

But, in the final analysis, it should be clear that real personal data should never have been used and providing such data to NYBC was most likely in breach of the IBTS’s own data protection policy.

February 21, 2008

More thoughts on the IBTS data breach

One of the joys of having occasional bouts of insomnia is that you can spend hours in the dead of night pondering what might have happened in a particular scenario based on your experience and the experience of others.

For example, the IBTS has rushed to assure us that the data that was sent to New York was encrypted to 256bit-AES standard. To a non-technical person that sounds impressive. To a technical person, that sounds slightly impressive.

However, a file containing 171000+ records could be somewhat large, depending on how many fields of data it contained and whether that data contained long ‘free text’ fields etc. When data is extracted from database it is usually dumped to a text file format which has delimiters to identify the fields such as commas or tab characters or defined field widths etc.

When a file is particularly large, it is often compressed before being put on a disc for transfer - a bit like how we all try to compress our clothes in our suitcase when trying to get just one bag on Aer Lingus or Ryanair flights. One of the most common software tools used (in the microsoft windows environment) is called WinZip. It compresses files but can also encrypt the archive file so that a password is required to open it. When the file needs to be used, it can be extracted from the archive, so long as you have the password for the compressed file. winzip encryption screenshot.
So, it would not be entirely untrue for the IBTS to say that they had encrypted the data before sending it and it was in an encrypted state on the laptop if all they had done was compressed the file using Winzip and ticked the boxes to apply encryption. And as long as the password wasn’t something obvious or easily guessed (like “secret” or “passw0rd” or “bloodbank”) the data in the compressed file would be relatively secure behind the encryption.

However, for the data to be used for anything it would need to be uncompressed and would sit, naked and unsecure, on the laptop to be prodded and poked by the application developers as they went about their business. Where this to be the case then, much like the fabled emperor, the IBTS’s story has no clothes. Unencrypted data would have been on the laptop when it was stolen. Your unencrypted, non-anonymised data could have been on the laptop when it was stolen.

The other scenario is that the actual file itself was encrypted using appropriate software. There are many tools in the market to do this, some free, some not so free. In this scenario, the actual file is encrypted and is not necessarily compressed. To access the file one would need the appropriate ‘key’, either a password or a keycode saved to a memory stick or similar that would let the encryption software know you were the right person to open the file.

However, once you have the key you can unencrypt the file and save an unencrypted copy. If the file was being worked on for development purposes it is possible that an unencrypted copy might have been made. This may have happened contrary to policies and agreements because, sometimes, people try to take shortcuts to get to a goal and do silly things. In that scenario, personal data relating to Irish Blood donors could have wound up in an unencrypted state on a laptop that was stolen in New York.

[Update**] Having discussed this over the course of the morning with a knowledgable academic who used to run his own software development company, it seems pretty much inevitable that the data was actually in an unencrypted state on the laptop, unless there was an unusual level of diligence on the part of the New York Blood Clinic regarding the handling of data by developers when not in the office.

The programmer takes data home of an evening/weekend to work on some code without distractions or to beat a deadline. To use the file he/she would need to have unencrypted it (unless the software they were testing could access encrypted files… in which case does the development version have ‘hardened’ security itself?). If the file was unencrypted to be worked on at home, it is not beyond possiblity that the file was left unencrypted on the laptop at the time it was stolen.

All of which brings me back to a point I made yesterday….

Why was un-anonymised production data being used for a development/testing activity in contravention to the IBTS’s stated Data Protection policy, Privacy statement and Donor Charter and in breach of section 2 of the Data Protection Act?

If the data had been fake, the issue of encryption or non-encryption would not be an issue. Fake is fake, and while the theft would be embarrassing it would not have constituted a breach of the Data Protection Act. I notice from Tuppenceworth.ie that the IBTSB were not quick to respond to Simon’s innocent enquiry about why dummy data wasn’t used.

February 20, 2008

Fair use/Specified purpose and the IBTS

I am a blood donor. I am proud of it. I have provided quite a lot of sensitive personal data to the IBTS over the years that I’ve been donating.

The specific purposes for which I believed I was providing the information was to allow the IBTS to administer communications with me as a donor (so I know when clinics are on so I can donate), to allow the IBTS to identify me and track my donation patterns, and to alert IBTS staff to any reasons why I cannot donate on a given occasion (donated too recently in the past, I’ve had an illness etc.). I accepted as implied purposes the use of my information for internal reporting and statistical purposes.

I did not provide the information for the purposes of testing software developed by a 3rd party, particularly when that party is in a foreign country.

The IBTS’s website (www.ibts.ie) has a privacy policy which relates to data captured through their website. It tells me that

The IBTS does not collect any personal data about you on this website apart from information which you volunteer (for example by emailing us or by using our on line contact forms). Any information which you provide in this way is not made available to any third parties, and is used by the IBTS only for the purpose for which you provided it.

So, if any information relating to my donor record was captured via the website, the IBTS is in breach of their own privacy policy. So if you register to be a donor… using this link… http://www.ibts.ie/register.cfm?mID=2&sID=77 then that information is covered by their Privacy policy and you would not be unreasonable in assuming that your data wouldn’t wind up on a laptop in a crackhouse in New York.

In the IBTS’s Donor Charter, they assure potential Donors that:

The IBTS guarantees that all personal information about donors is kept in the strictest confidence

Hmm… so no provision here for production data to be used in testing. Quite the contrary.

However, it gets even better… in the Donor Information Leaflet on the IBTS’s website, in the Data Protection section (scroll down… it’s right at the bottom), current and potential donors the IBTS tells us that (emphasis is mine throughout):

The IBTS holds donor details, donation details and test results on a secure computerised database. This database is used by the IBTS to communicate with donors and to record their donation details, including all blood sample test results. It is also used for the proper and necessary administration of the IBTS. All the information held is treated with the strictest confidence.

This information may also be used for research in order to improve our knowledge about the blood donor population, and for clinical audit, to assess and improve the quality of our service. Wherever possible, all such information will be anonymised.

Right.. so from their policy and their statement of fair use and specified purposes we learn that:

  1. They can use it for communication with donors and for tracking donation details and results of tests (as expected)
  2. They can use it for necessary administration. Which covers internal reporting but, I would argue, not giving it to other organisations to lose on their behalf.
  3. They can use it for research about the blood donor population, auditing clinical practices. This is OK… and expected.
  4. They are also permitted to use the data to “improve the quality of [their] service”. That might cover the use of the data for testing…

Until you read that last bit… the data would be anonymised whenever possible. That basically means the creation of dummy data as described towards the end of my last post on this topic.

So, the IBTS did not specify at any time that they would use the information I had provided to them for the purposes of software development by 3rd parties. It did specify a purpose for using the information for the improvement of service quality. But only if it was anonymised.

Section 2 of the Data Protection Act says that data can only be used by a Data Controller for the specific purposes for which it has been gathered. As the use of un-anonymised personal data for the purposes of software development by agencies based outside of the EU (or in the EU for that matter) was not a specified use, the IBTS is, at this point, in breach of the Data Protection Act. If the data had been anonymised (ie if ‘fictional’ test data had been used or if the identifying elements of the personal data had been muddled up before being transferred) there would likely be no issue.

  • Firstly, the data would have been provided in a manner consistent with the specified use of the data
  • Secondly, there would have been no risk to personal data security as the data on the stolen laptop would not have related to an identifiable person in the real world.

Of course, that would have cost a few euros to do so it was probable de-scoped from the project.

If I get a letter and my data was not anonymised I’ll be raising a specific complaint under Section 2 of the Data Protection Act. If the data was not anonymised (regardless of the security precautions applied) then the IBTS is in breach of their specified purposes for the collection of the data and are in breach of the Data Protection Act.

Billy Hawkes, if you are reading this I’ve just saved your team 3 weeks work.