Tag: Information Quality

More thoughts on the IBTS data breach

One of the joys of having occasional bouts of insomnia is that you can spend hours in the dead of night pondering what might have happened in a particular scenario based on your experience and the experience of others.

For example, the IBTS has rushed to assure us that the data that was sent to New York was encrypted to 256bit-AES standard. To a non-technical person that sounds impressive. To a technical person, that sounds slightly impressive.

However, a file containing 171000+ records could be somewhat large, depending on how many fields of data it contained and whether that data contained long ‘free text’ fields etc. When data is extracted from database it is usually dumped to a text file format which has delimiters to identify the fields such as commas or tab characters or defined field widths etc.

When a file is particularly large, it is often compressed before being put on a disc for transfer – a bit like how we all try to compress our clothes in our suitcase when trying to get just one bag on Aer Lingus or Ryanair flights. One of the most common software tools used (in the microsoft windows environment) is called WinZip. It compresses files but can also encrypt the archive file so that a password is required to open it. When the file needs to be used, it can be extracted from the archive, so long as you have the password for the compressed file. .
So, it would not be entirely untrue for the IBTS to say that they had encrypted the data before sending it and it was in an encrypted state on the laptop if all they had done was compressed the file using Winzip and ticked the boxes to apply encryption. And as long as the password wasn’t something obvious or easily guessed (like “secret” or “passw0rd” or “bloodbank”) the data in the compressed file would be relatively secure behind the encryption.

However, for the data to be used for anything it would need to be uncompressed and would sit, naked and unsecure, on the laptop to be prodded and poked by the application developers as they went about their business. Where this to be the case then, much like the fabled emperor, the IBTS’s story has no clothes. Unencrypted data would have been on the laptop when it was stolen. Your unencrypted, non-anonymised data could have been on the laptop when it was stolen.

The other scenario is that the actual file itself was encrypted using appropriate software. There are many tools in the market to do this, some free, some not so free. In this scenario, the actual file is encrypted and is not necessarily compressed. To access the file one would need the appropriate ‘key’, either a password or a keycode saved to a memory stick or similar that would let the encryption software know you were the right person to open the file.

However, once you have the key you can unencrypt the file and save an unencrypted copy. If the file was being worked on for development purposes it is possible that an unencrypted copy might have been made. This may have happened contrary to policies and agreements because, sometimes, people try to take shortcuts to get to a goal and do silly things. In that scenario, personal data relating to Irish Blood donors could have wound up in an unencrypted state on a laptop that was stolen in New York.

[Update**] Having discussed this over the course of the morning with a knowledgable academic who used to run his own software development company, it seems pretty much inevitable that the data was actually in an unencrypted state on the laptop, unless there was an unusual level of diligence on the part of the New York Blood Clinic regarding the handling of data by developers when not in the office.

The programmer takes data home of an evening/weekend to work on some code without distractions or to beat a deadline. To use the file he/she would need to have unencrypted it (unless the software they were testing could access encrypted files… in which case does the development version have ‘hardened’ security itself?). If the file was unencrypted to be worked on at home, it is not beyond possiblity that the file was left unencrypted on the laptop at the time it was stolen.

All of which brings me back to a point I made yesterday….

Why was un-anonymised production data being used for a development/testing activity in contravention to the IBTS’s stated Data Protection policy, Privacy statement and Donor Charter and in breach of section 2 of the Data Protection Act?

If the data had been fake, the issue of encryption or non-encryption would not be an issue. Fake is fake, and while the theft would be embarrassing it would not have constituted a breach of the Data Protection Act. I notice from Tuppenceworth.ie that the IBTSB were not quick to respond to Simon’s innocent enquiry about why dummy data wasn’t used.

February 21, 2008
Fair use/Specified purpose and the IBTS
I am a blood donor. I am proud of it. I have provided quite a lot of sensitive personal data to the IBTS over the years that I’ve been donating.

The specific purposes for which I believed I was providing the information was to allow the IBTS to administer communications with me as a donor (so I know when clinics are on so I can donate), to allow the IBTS to identify me and track my donation patterns, and to alert IBTS staff to any reasons why I cannot donate on a given occasion (donated too recently in the past, I’ve had an illness etc.). I accepted as implied purposes the use of my information for internal reporting and statistical purposes.

I did not provide the information for the purposes of testing software developed by a 3rd party, particularly when that party is in a foreign country.

The IBTS’s website (www.ibts.ie) has a privacy policy which relates to data captured through their website. It tells me that

The IBTS does not collect any personal data about you on this website apart from information which you volunteer (for example by emailing us or by using our on line contact forms). Any information which you provide in this way is not made available to any third parties, and is used by the IBTS only for the purpose for which you provided it.

So, if any information relating to my donor record was captured via the website, the IBTS is in breach of their own privacy policy. So if you register to be a donor… using this link… http://www.ibts.ie/register.cfm?mID=2&sID=77 then that information is covered by their Privacy policy and you would not be unreasonable in assuming that your data wouldn’t wind up on a laptop in a crackhouse in New York.

In the IBTS’s Donor Charter, they assure potential Donors that:

The IBTS guarantees that all personal information about donors is kept in the strictest confidence

Hmm… so no provision here for production data to be used in testing. Quite the contrary.

However, it gets even better… in the Donor Information Leaflet on the IBTS’s website, in the Data Protection section (scroll down… it’s right at the bottom), current and potential donors the IBTS tells us that (emphasis is mine throughout):

The IBTS holds donor details, donation details and test results on a secure computerised database. This database is used by the IBTS to communicate with donors and to record their donation details, including all blood sample test results. It is also used for the proper and necessary administration of the IBTS. All the information held is treated with the strictest confidence.

This information may also be used for research in order to improve our knowledge about the blood donor population, and for clinical audit, to assess and improve the quality of our service. Wherever possible, all such information will be anonymised.

Right.. so from their policy and their statement of fair use and specified purposes we learn that:
1. They can use it for communication with donors and for tracking donation details and results of tests (as expected)
2. They can use it for necessary administration. Which covers internal reporting but, I would argue, not giving it to other organisations to lose on their behalf.
3. They can use it for research about the blood donor population, auditing clinical practices. This is OK… and expected.
4. They are also permitted to use the data to “improve the quality of [their] service”. That might cover the use of the data for testing…
Until you read that last bit… the data would be anonymised whenever possible. That basically means the creation of dummy data as described towards the end of my last post on this topic.

So, the IBTS did not specify at any time that they would use the information I had provided to them for the purposes of software development by 3rd parties. It did specify a purpose for using the information for the improvement of service quality. But only if it was anonymised.

Section 2 of the Data Protection Act says that data can only be used by a Data Controller for the specific purposes for which it has been gathered. As the use of un-anonymised personal data for the purposes of software development by agencies based outside of the EU (or in the EU for that matter) was not a specified use, the IBTS is, at this point, in breach of the Data Protection Act. If the data had been anonymised (ie if ‘fictional’ test data had been used or if the identifying elements of the personal data had been muddled up before being transferred) there would likely be no issue.
- Firstly, the data would have been provided in a manner consistent with the specified use of the data
- Secondly, there would have been no risk to personal data security as the data on the stolen laptop would not have related to an identifiable person in the real world.
Of course, that would have cost a few euros to do so it was probable de-scoped from the project.

If I get a letter and my data was not anonymised I’ll be raising a specific complaint under Section 2 of the Data Protection Act. If the data was not anonymised (regardless of the security precautions applied) then the IBTS is in breach of their specified purposes for the collection of the data and are in breach of the Data Protection Act.

Billy Hawkes, if you are reading this I’ve just saved your team 3 weeks work.
February 20, 2008
Irish Blood Transfusion Service loses data..
Why is it that people never learn? Only months after the debacle of HMRC sending millions of records of live confidential data whizzing around in the post on 2 CDs (or DVDs), the Irish Blood Transfusion Service (IBTS) has had 171,000 records of blood tests and blood donors stolen.

The data was on a laptop (bad enough from a security point of view). The data was (apparently) secured with 256bit AES encryption (happy days if true). The laptop was taken in a mugging (unfortunate). The mugging took place in New York (WTF!?!?)

Why was the data in New York?
It would seem that the IBTS had contracted with the New York Blood Centre (NYBC) for the customisation of some software that the NYBC had developed to better manage information on donors and blood test results. To that end the IBTS gave a copy of ‘live’ (or what we call in the trade ‘production’) data to the NYBC for them to use in developing the customisations.

So, personal data, which may contain ‘sensitive’ data relating to sexual activity, sexual behaviour, medicial conditions etc. was sent to the US. But it was encrypted, we are assured.

A quick look at the Safe Harbor list of the US Dept of Commerce reveals that the NYBC is not registered as being a ‘Safe Harbor’ for personal data from within the EU. Facebook is however (and we all know how compliant Facebook is with basic rules of data protection).

Apparently the IBTS relied on provisions of their contract with the NYBC to ensure and assure the security of the data relating to REAL people. As yet no information has come to light regarding whether any audits or checks were performed to ensure that those contractual terms were being complied with or were capable of being complied with.

How did the data get to New York?
From the IBTS press release it is clear that the data got to New York in a controlled manner.
An employee of NYBC took the disc back from Ireland and placed it in secure storage.

Which is a lot better than sticking two CDs in the post, like the UK Revenue services did not so long ago.

What about sending the data by email? Hmmm… nope, not secure enough and the file sizes might be to big. A direct point to point FTP between two servers? that would work as well, assuming that the FTP facilities were appropriately secured by firewalls and a healthy sense of paranoia.

Why was the data needed in New York?
According to the Irish Times

The records were in New York, the blood service said, “because we are upgrading the software that we use to analyse our data to provide a better service to donors, patients and the public service”.

Cool. So the data was needed in New York to let the developers make the necessary modifications to code.

Nice sound bite. Hangs together well. Sounds reasonable.

Unfortunately it is total nonsense.

For the developers to make modifications to an existing application, what was required in New York was
- A detailed specification of what the modifications needed to be to enable the software to function for Irish datasets and meet Irish requirements. Eg. if the name/address data capture screens needed to change they should have been specified in a document. If validation routines for zip cods/postcodes needed to be turned off, that should have been specified. If base data/reference data needed to be change – specify it in a document. Are we seeing a trend here?
- Definition of the data formats used in Ireland. by this I mean the definition of the formats of data such as “social security number”. We call it a PPSN and it has a format nnnnnnnA as opposed to the US format which has dashes in the middle. A definition of the data formats that would be used in Ireland and a mapping to/from the US formats would possibly be required… this is (wait for it) another document. NOT THE DATA ITSELF
- Some data for testing. Ok, so this is why all 171000+ records were on a laptop in New York. ehh… NO. What was required was a sample data set that replicates the formats and patterns of data found in the IBTS production data. This does not mean a cut of production data. What this means is that the IBTS should have created dummy data that was a replica of production data (warts and all – so if there are 10% of their records that have text values in fields where numbers would be expected, then 10% of the test data should reflect this). The test data should also be tied to specific test cases (experiments to prove or disprove functionality in the software).
At no time was production data needed for development or developer testing activities in New York. Clear project specification and requirements documentation, documents about data formatting and ‘meta-data’ (data about data), Use Cases (walk throughs of how the software would be used in a given process – like a movie script) and either a set of dummy sample data that looks and smells like you production data or a ‘recipe’ for how the developer can create that data.

But the production data would be needed for Acceptance testing by IBTS?
eh… nope. And even if it was it would not need to be sent to New York for the testing.

User Acceptance testing is a stage of testing in software development AFTER the developer swears blind that the software works as it should and BEFORE the knowledge workers in your organisation bitch loudly that the software is buggered up beyond all recognition.

As with all testing it does not require a the use of production data is not required, and indeed is often a VERY BAD IDEA (except in certain extreme circumstances such as the need for volume stress testing or testing of very complex software solutions that need data that is exactly like production to be tested effectively… eg. a complex parsing/matching/loading process on a multi-million record database – and even at that, key data not relevant to the specific process being tested ought to be ‘obscured’ to ensure data protection compliance ).

What is required is that your test environment is as close a copy to the reality you are testing for as possible. So, from a test data point of view, creating test data that looks like your production data is the ideal. One way is to do data profiling, develop an understanding of the ‘patterns’ and statistical trends in your data and then hand carve a set of test data that looks and smells like your production data but is totally fake and fraudulent and safe. Another approach is to take a copy of your production data and bugger around with it to mix names and addresses up, replace certain words in address data with different words (e.g. “Park” with “Grove” or “Leitrim” with “Carialmeg” or “@obriend.info” with “obriend.fakedatapeople” – whatever works). So long as the test data is representative of the structure and content of your production data set and can support the test scenarios you wish to perform then you are good to go.

So, was the production data needed in New York – Nope. Would it be needed for testing in a test event for User Acceptance testing? Nope.

And who does the ‘User Acceptance testing’? Here’s a hint… whats the first word? User Acceptance testing is done by representatives of the people who will be using the software. They usually follow test scripts to make sure that specific functionality is tested for, but importantly they can also highlight were things are just wrong.

So, were there any IBTS ‘users’ (knowledge workers/clerical staff) in New York to support testing? We don’t know. But it sounds like the project was at the software development stage so it is unlikely. So why the heck was production data being used for development tasks?

So… in conclusion
The data was stolen in New York. It may or may not have been encrypted (the IBTS has assured the public that the data was encrypted on the laptop… perhaps I am cynical but someone who takes data from a client in another nation home for the weekend might possibly have decrypted the data to make life easier during development). We’re not clear (at this point) how the data got to New York – we’re assuming that an IBTS employee accompanied it to NY stored on physical media (the data, not the employee).

However, there is no clear reason why PRODUCTION data needed to be in New York. Details of how the IBTS’s current data formats might map to the new system, details of requirements for changes to the NYBC’s current system to meet the needs of the IBTS, details of the data formats in the IBTS’s current data sets (both field structues and, ideally, a ‘profile’ of the structure of the data and any common errors that occur) and DUMMY data might be required for design, development and developer testing are all understandable. Production data is not.

There is no evidence, other than the existence of a contractual arrangement, that the NYBC had sufficient safeguards in place to ensure the safety of personal data from Ireland. The fact that an NYBC employee decided to take the data out of the office into an unsecure environment (down town New York) and bring it home with them would evidence that, perhaps, there is a cultural and procedural gap in NYBC’s processes that might have meant they either couldn’t comply or didnt’ understand what the expectation of the clauses in those contracts actually meant.

For testing, what is required is a model of production. A model. A fake. A facsimile NOT PRODUCTION. The more accurate your fake is the better. But it doesn’t need to be a carbon copy of your production data with exactly the same ‘data DNA’… indeed it can be a bad idea to test with ‘live’ data. Just like it is often dangerous to play with ‘live’ grenades or grab a ‘live’ power line to see what will happen.

The loss of our IBTS data in New York evidences a failure of governance and a ‘happy path’ approach to risk planning, and a lack of appreciation of the governance and control of software development projects to ensure the protection of live data.

As this was a project for the development of a software solution there was no compelling reason that I can identify for production data to have been sent from Ireland to New York when dummy data and project documentation would have sufficed.

The press release from the IBTS about this incident can be found here..

[Update – Simon over at Tuppenceworth has noted my affiliation to the IAIDQ. Just to clarify, 99% of this post is about basic common sense. 1% is about Information Management/Information Quality Management. And as this post is appearing here and not on the IAIDQ’s website it goes without saying that my comments here may not match exactly the position of the IAIDQ on this issue. I’m also a member of the ICS, who offer a Data Protection certification course which I suspect will be quite heavily subscribed the next time it runs.]

[Update 2: This evening RTE News interviewed Dr David Gray from DCU who is somewhat of an expert on IT security. The gist of Dr Gray’s comments were that software controls to encrypt data are all well and good, but you would have to question the wisdom of letting the information wander around a busy city and not having it under tight physical control… which is pretty much the gist of some of my comments below. No one has (as yet) asked why the hell production data rather than ‘dummy’ data was being used during the development phase of a project.]
February 20, 2008
Getting back to my Information Quality agenda

One or two of the comments (and emails) I received after the previous post here were enquiring about some stuff I’d written previously (2006 into 2007) about the state of the Irish Electoral Register.

It is timely that some people visited those posts as our Local Elections are coming up in less than 18 months (June 2009) and frankly, unless there is some immense effort going on behind the scenes that I haven’t heard of, the Register is still in a poor state.

The issue isn’t the Register per se but the processes that surround it, the apparent lack of a culture where the leadership take the quality of this information seriously enough to make the necessary changes to address the cultural, political and process problems that have resulted in it being buggered.

There are a few consolidating posts knocking around on this blog as I’ve pulled things together before. However a quick search for “Electoral Register” will pull all the posts I’ve done on this together. (If you’ve clicked the link all the articles are presented below).

I’ve also got a presentation on the subject over at the IQNetwork website, and I did a report (which did go to John Gormely’s predecessor) which can be found here, and I wrote Scrap and Rework articlethat I submitted to various Irish newspapers at the time to no avail but which has been published internationally (in print and on-line).

At this stage, I sense that as it doesn’t involve mercury filled CFLs or Carbon taxes, the state of the electoral register and the legislative framework that surrounds it (a lot of the process issues require legislative changes to address them) has slipped down the list of priorities our Minister has.

However, with Local Elections looming it is important that this issue be addressed.

January 18, 2008
Information Quality in 2008…

So yet another year draws to a close. Usually around this time of year I try to take a few hours to review how things went, what worked and what still needs to be worked on in the coming year. In most cases that is very personal appraisal of whether I had a ‘quality’ year – did I meet or exceed my own expectations of myself (and I’m a bugger for trying to achieve too much too quickly).

Vincent McBurney’s Blog Carnival of Data Quality has invited submissions on the theme “Happy New Year”, so I thought I’d take a look back over 2007 and see what emerging trends or movements might lead to a Happy New Year for Information Quality people in 2008.

Hitting Mainstream
In 2007 Information Quality issues began to hit the mainstream. It isn’t quite there yet but 2007 saw the introduction of taught Master’s degree programmes in Information Quality in the University of Arkansas at Little Rock and there have been similar developments mooted in at least one European University. If educators think they can run viable courses that will make money then we are moving out of the niche towards being seen asa a mainstream discipline of importance to business.

The IAIDQ’s IDQ Conference in Las Vegas was a significant success, with numbers up on 2006 and a wider mix of attendees. I did an unofficial straw poll of people at that conference and the consensus from the delegates and other speakers was that there were more ‘Business’ people at the conference than previous Information Quality conferences they’d attended, a trend that has been growing in recent years. The same was true at the European Data Management and Information Quality Conference(s) in London in November. Numbers were up on previous years. There were more ‘Business’ people in the mix, up even on last year. – this of course is all based on my unofficial straw poll and could be wrong.

The fact that news stories abounded in 2007 about poor quality information and the initial short sharp shock of Compliance and SOx etc. has started to give rise to questions of how to make Compliance a value-adding function (hint – It’s the INFORMATION people) may help, but the influence of bloggers such as Vincent, and the adoption of blogs as communications tools by vendors and by Professional Associations such as the IAIDQ is probably as big if not more of an influence IMHO.

Also, and I’m not sure if this is a valid benchmark, I’ve started turning down offers to present at conferences and write articles for people on IQ issues. because a) I’m too busy with my day job and with the IAIDQ (oh yeah… and with my family) and b)there are more opportunities arising than I’d ever have time to take on.

Unfortunately, much of the ‘mainstream’ coverage of Information Quality issues either views it either as a ‘technology issue’ (most of my articles in Irish trade magazines are stuck in the ‘Technology’ section) or fails to engage with the Information Quality aspects of the story fully. The objective of IQTrainwrecks.com is to try to highlight the Information Quality aspects of things that get into the media.

What would make 2008 a Happy Year for me would be to have more people contributing to IQ Trainwrecks but also to have some happy path stories to tell and also for there to be better analysis of these issues in the media.

Community Building
There is a strong sense of ‘community’ building amongst many of the IQ practitioners I speak with. That has been one of the key goals of the IAIDQ in 2007 – to try and get that sense of Community triggered to link like-minded-people and help them learn from each other. This has started to come together. However it isn’t happening as quickly as I’d like, because I have a shopping list of things I want yesterday!

What would make 2008 a happy new year for me would be for us to maintain the momentum we’ve developed in connecting the Community of Information/Data Quality professionals and researchers. Within the IAIDQ I’d like us to get better at building those connections (we’ve become good… we need to keep improving).

I’d like to see more people making contact via blogs like Vincent’s or mine or through other social networking facilities so we can build the Community of Like Minded people all focussing on the importance of Information Quality and sharing skills, tips, tools, tricks and know how about how to make it better. I’d be really happy at the end of 2008 a few more people make the transition from thinking they are the ‘lonely voice’ in their organisation to realising they are part of a very large choir that is singing an important tune.

Role Models for Success
2007 saw a few role models for success in Information Quality execution emerging. All of these had similar stories and similar elements that made up their winning plan. It made a change from previous years when people seemed afraid to share – perhaps because it is so sensitive a subject (for example admitting you have an IQ problem could amount to self-incrimination in some industries)? In the absence of these sort of ‘role models’ it is difficult to sell the message of data quality as it can come across as theoretical.

I’d be very happy at the end of 2008 if we had a few more role models of successful application of principles and tools – not presented by vendors (no offence to vendors) but emerging from within the organisations themselves. I’d be very happy if we had some of these success stories analysed to highlight the common Key Success Factors that they share.

Break down barriers
2007 saw a lot of bridges being built within the Information Quality Community. 2006 ended with a veritable bloodbath of mergers and acquisitions amongst software vendors. 2007 had a development of networks and mutual support between the IAIDQ (as the leading professional organisation for IQ/DQ professionals) and MIT’s IQ Programme. In many Businesses the barriers that have prevented the IQ agenda from being pursued are also being overcome for a variety of reasons.

2008 should be the year to capitalise on this as we near a signicificant tipping point. I’d like to see 2008 being the year were organisations realise that they need to push past the politics of Information Quality to actually tackle the root causes. Tom Redman is right – the politics of this stuff can be brutal because to solve the problems you need to change thinking and remould governance all of which is a dangerous threat to traditional power bases. The traditional divide between “Business” and “IT” is increasingly anachronistic, particularly when we are dealing with information/data within systems. If we can make that conceptual leap in 2008 to the point were everyone is inside the same tent peeing out… that would be a good year.

Respect
For most of my professional life I’ve been the crazy man in the corner telling everyone there was an elephant in the room that no-one else seemed able to see. It was a challenge to get the issues taken seriously. Even now I have one or two managers I deal with who still don’t get it. However most others I deal with do get it. They just need to be told what they have. 2007 seems to be the year that the lights started to go on about the importance of the Information Asset. Up to now, people spoke about it but didn’t ‘feel’ it… but now I don’t have trouble getting my Dept Head to think in terms of root causes, information flows etc.

2008 is the year of Respect for the IQ Practitioner…. A Happy New Year for me would be to finish 2008 with appropriate credibility and respect for the profession. Having role models to point to will help, but also having certification and accreditation so people can define their skillsets as ‘Information Quality’ skill sets (and so chancers and snake-oil peddlers can be weeded out).

Conclusion
2007 saw discussion of Information Quality start to hit the mainstream and the level of interest in the field is growing significantly. For 2008 to be a Happy New Year we need to build on this, develop our Community of practitioners and researchers and then work to break down barriers within our organisations that are preventing the resolution of problems with information quality. If, as a community of Information/Data Quality people we can achieve that (and the IAIDQ is dedicated to that mission) and in doing so raise our standards and achieve serious credibility as a key management function in organisations and as a professional discipline then 2008 will have been a very Happy New Year.

2008 already has its first Information Quality problem though…. looks like we’ve got a bit of work to do to make it a Happy New Year.

December 16, 2007
Things that peeve me on the web (a revisit)

Vodafone have launched a Christmas e-card site with a difference called Bosco is back. On this site you can put together a custom video e-card featuring Bosco, a perennial kids TV favourite in Ireland.

Why does this site peeve me? Well, due to the way the video is put together (pre-recorded video clips that are assembled in real-time) a lot of the process is driven by drop down menus to select names etc. This is where the problem starts.

As people who have come to my conference presentations know, a lot of my interest in Information Quality stems from the fact that my name (Daragh) has approx 12 alternate spellings and can be either male or female. These simple facts have motivated me over the years to be a bit pedantic about my name (1 ‘R’, a ‘GH’ at the end -silent, Male). So I was a bit dismayed when I flagged my gender as ‘Male’ on the “Bosco is Back” and looked for my name, only to find…

That’s annoying. To cater for the alternate spellings (such as Daragh, Darach, Dara, Daire) it would have been easy enough just to link them to the same video insert. However, it is not as bad as if I was a woman. According to Vodafone “Darragh” (and apparently all the phonetic variants thereof) is only a guys name.

Also, some of the inserts give unexpected outcomes. I was going to send my wife an e-card describing her as a “Dreamer”. Thankfully there is a preview mode which showed me what she’d see. Given that the squeaky voiced puppet would have demanded that she “stop thinking about that girl” I decided it might require more explaining at home than I could possibly manage.

Yes, the whole thing is a bit of fun and I’m probably being overly pedantic. However it does highlight the risk of having ‘non-quality’ outcomes when you rely on drop down menus and defined lists to operate a business process. What, if instead of producing a cheezee e-card I had been applying for phone service from vodafone?

When I get a chance I’ll post up the slides I use about “why I got into Information Quality”… research this morning has identified another 3 variant spellings of my name at least….

December 13, 2007
Amazon-inania again…
So, Christmas is coming, the Goose is getting fat. I thought I’d put some euros in Jeff Bezo’s hat..

So I decided to try to order some Xbox games as part of my Christmas shopping. I fully expected to get big “DANGER WILL ROBINSON” warnings for all the purchases given Amazon’s decision NOT to sell software or electrical goods into the Irish market for no apparent reason (which I’ve written about before here and here and here and which featured on other blogs last year… here…. and which I brought to the attention of the relevant Government Minister here). I haven’t actually received a response on this yet, over a year later. Shame on me for not chasing it up.

Imagine my fricking surprise when I got this…

Apparently the XBox game Ratatouille is not the same class of thing as the XBox games “Cars Mater-national” or “the Simpsons”. Now, this puts Amazon across two of my pet bugbears…
1. Nonsensical and unexplained restrictions on shipping of goods within the EU (which, in the absence of a REALLY good explanation is probably a breach of EU law)
2. Buggered up information quality
If all of the game titles had been restricted I’d have simply shrugged my shoulders and moved on. But they weren’t. This suggests that either:
- The information which Amazon use to classify their games and software is inaccurate or incomplete and allows exceptions through the net (boo hiss)
- OR (worryingly) The restriction on shipping electrical goods, games and software has less to do with the WEEE regulations in Ireland (Amazon’s nonsense excuse) but have more to do with producers seeking to create and maintain artificial market segregation. In the context of a web site selling into Ireland, that could raise issues of EU law and, if it is the case that a number of different manufacturers have made similar requests to Amazon to restrict the Irish Market, then that could be viewed as a cartel-like operation, which is apparently a bad thing.
Not that Amazon would pander to that kind of thing. Gosh no. This has to relate to the Waste Electrical and Electronic Equipment regulations because they define Electrical and Electronic Equipment as:

â€œelectrical and electronic equipmentâ€ means equipment which is dependent on electric currents or electromagnetic fields in order to work properly and equipment for the generation, transfer and measurement of such currents and fields falling under the categories set out in Annex IA of European Parliament and Council Directive 2002/96/EC on waste electrical and electronic equipment and designed for use with a voltage rating not exceeding 1,000 volt for alternating current and 1,500 volt for direct current;

Yes. That definitely includes inert plastic with encrypted digital information on it (aka a dvd or cd with MS Office or Halo3 on it – take yer pick). Although, if you were particularly pedantic an Xbox game does rely on “electric currents or electromagnetic fields in order to work properly”. But only if you are being RIDICULOUSLY pedantic. I am pedantic. I’m renowned for it. Even I wouldn’t stretch things that far…

Either way it is an avoidable and undesirable process outcome, and as it is happening inconsistently it is embarrasing. . It is particularly irksome given that Amazon are basing a Customer Service Call Centre in Cork and have a Service and Operations centre in Dublin and have been applauded by our Government for their investments.. Amazon’s relocation from Slough to Ireland was caught by the BEEB…

I’ve posted on this previously and these posts can be found under the Amazon-Inania category on this blog.
December 11, 2007
Things that peeve me on the web

A few things peeve me on the web. One of them is website form validators that do not recognise tlds other than .com, .org or a country tld. These validators seem oblivious to the fact that since 2000 ICANN has been rolling out ‘new’ tlds to take the ‘pressure’ off the .com and .org domains and .info has been active as a tld since 2001.

I chose .info for my domain name partly because my old obriend.com domain was hijacked and partly because that problem manifested an opportunity for me to rebrand myself on-line with a domain name that related to me and my interests. Obriend.info is a website dedicated to information about OBrienD (me) and where OBrienD can discuss topics relating to Information Quality and Information Management (Info).

However I find myself having to fall back on other email addresses such as my gmail or IAIDQ email address when filling out web forms as many validators (often on very reputable and high-profile sites) reject .info as part of an email address, in blissful ignorance of the fact that up to March 2007 there were 4 million .info domains registered with 1.6 million .info websites active (this being one of them).

This is a small but significant information quality problem. The ‘master data’ that is being used to support the validation processes on these sites is incomplete, out of date and inaccurate. Web developers should take the time to verify if the snippets of code they are using to validate email addresses contain all valid TLDs and if not they should update their code. Not doing so results in lost traffic to your site, and in the case of registration forms for e-commerce sites it costs you a sale (or three).

Another thing that peeves me is the use of (or not) of apostrophes in email addresses. Names like O’Donnell and the usual spelling of O’Brien have apostrophes. Some organisations allow them as part of their email addresses (joe.o’connor@thisisnotarealdomain.lie). For some reason however, many CMS platforms, website validators etc. don’t handle this construct particularly well. Indeed I’ve seen some chat forums where ‘experts’ advise people to leave out the apostrophe to avoid problems, even though the apostrophe is perfectly permissable under the relevant RFC standards.

I’ve experienced the problem with Joomla and Community Builder on the IQ Network website which required me to manually work around the issue as I am not a good enough php developer to hack either application to fix the problem in a way that doesn’t cause other problems (such as the apostrophe being displayed back with an escaping backslash – ” \’ “.

On the web you are in a global community. Just because your country/culture doesn’t use apostrophes or accenting characters doesn’t mean that they are not valid. Your code should be built to handle these occurences and to avoid corrupting data. Joe O’Connor’s name (to return to our fictional example) is not Joe O\’Connor. He should not see his name displayed as such on a form. Furthermore it should not be exported as such from a database into other processes.

Likewise, if Joe.O’Connor@fictionaldomain.info decides he wants to register at your site you should make sure you can correctly identify his tld as valid and get his name right.

September 11, 2007
The evolution of Information Quality

I was googling today (or doing some googlage) for blogs that deal with Information and Data Quality topics. Needless to say yours truly did appear reasonably highly the search results. One post that I came across that really made me think a bit was this one from Andrew Brooks, currently a Senior Consultant with Cap Gemini in the UK.

In his post he asks if we are at a ‘tipping point’ for Information Quality where

organisations are starting to move from â€˜unconscious incompetenceâ€™ to â€™conscious incompetenceâ€™ and see the need to spend money in this area (hence the growing number of vendors and consultancies) which are feeding off the back of this.

He mentions that he gets calls from recruiters looking for Data Quality Management roles to be filled and wonders when we will reach the stage of ‘Concious Competence’.

My personal feeling is that we are at a very large tipping point. Those organisations that truly make the leap will gain significant advantage over those that don’t. Those that make the leap half-heartedly by putting a few job titles and tools in the mix with no commitment or plan will limp along, but the pressure of competing with lean and efficient opposition (those who jump in wholeheartedly) will squeeze on these organisations. Those that don’t leap at all will fall foul of Darwinian evolution in the business context.

The danger that we face at this juncture is that when the ship is sinking any bandwagon looks like a lifeboat. The risk that we face is that we will not have learned the lessons of the CRM adoption age when organisations bought ‘CRM’ (ie software) but didn’t realise the nature of the process and culture changes that were required to successfully improve the management of Customer Relationships. Tools and job titles do not a success make.

The same was true of Quality management in manufacturing. As Joseph Juran said:

“They thought they could make the right speeches, establish broad goals, and leave everything else to subordinatesâ€¦ They didnâ€™t realize that fixing quality meant fixing whole companies, a task that cannot be delegated.â€

So, what can be done?

The International Association for Information and Data Quality was founded in 2004 by Tom Redman and Larry English (both referenced in Mr Brook’s article) to promote and develop best practices and professionalism in the field of Information and Data Quality.

As a vendor neutral organisation part of the Association’s mission is to cut through the hype and sales pitches to nail down, clarify and refine the core fundamental principles of Information Quality Management and to support Information/Data Quality professionals (I use the terms interchangeably, some people don’t…) in developing and certifying their skills so that (for example) the recruiter looking for a skilled Data Quality Manager has some form of indicator as to the quality of the resource being evaluated.

The emergence of such an organisations and the work that is being done to develop formal vendor independent certification and accreditation evidences the emergence of the ‘early adopters’ of the ‘Concious committment’ that Mr. Brooks writes about. As an Information Quality professional I am concious that there is a lot of snake-oil swilling around the market, but also a lot of gems of wisdom. I am committed to developing my profession and developing the professional standards of my profession (vocation might be another word!).

Having a rallying point where interested parties can share and develop sound practices and techniques will possibly accelerate the mainstreaming of the Concious Committment… IQ/DQ professionals (and researchers… must’t forget our colleagues in academia) need no longer be isolated or reinvent the wheel on their own.

Let me know what you think….

August 16, 2007
Dell hell comes to an end…
My Dell Hell has come to an end. The outcome is not entirely what I had hoped for, but at least the issue has been resolved and I understand what has beeng going on.

Thanks to John who took the time to follow through and look at the information that I had posted on this blog about the graphics card that was installed in my laptop. I had ‘spoken with data’ by presenting a screen shot of the diagnositics utility for the graphics card. John took this information and responded in kind – he provided information to me that explained that what I was seeing in the graphics card diagnostics confirmed that the graphics card that was installed in my laptop now is the graphics card that I ordered.

5 months of frustration on my part, half a dozen graphics cards sent to me by Dell and the root cause of the problem was a failure of the information provided about the graphics card to properly meet – or perhaps more accurately to properly set- my expectations as to the performance and capability of the graphics card.

5 months of costs that could easily have been avoided if the information provided about the graphics card had been complete and timely.

It transpires that the hypermemory technology used in the ATI graphics cards means that the card ships with 128mb dedicated video ram but it ‘borrows’ from the system memory as required, up to a maximum of 256MB. Unfortunately there is nothing in the laptop that shows this, leading to confusion. The bios registers 128mb, and the graphics card’s own diagnositics display 128MB with no mention of the ‘reserve tank’ that can be dipped into. There is no indication that the card has a greater capability in reserve.

John found only one specific reference to this in the on-line documentation for the model of laptop. This was in a footnote. This is important information… it should perhaps have been put in a more prominent position in the documentation?

In my email discussions with John on this topic we discussed various options that might be explored to improve the presentation of information about these types of graphics card technologies. He assured me he would bring them forward as suggestions to improve the customer experience for Dell customers. I hope he does so and some changes are implemented. The business case for doing this is simple.. it avoids support costs and increases customer satisfaction.

My suggestions to John included:
1. Information about how the cards work should be presented at point of sale. In particular information about what customers should expect to see in any diagnostics tools should be provided.
2. The information about how ‘hypermemory’ type graphics technologies work should be promoted from a footnote to a more prominent position in on-line and print documentation.
3. Dell should request (or even require) the manufacturers of these graphics cards to modify their diagnostic tools to display the on-board video RAM and the maximum capacity of the ‘reserve tank’ in system memory that can be utilised. I’ll discuss this last suggestion in a bit more detail in a moment.
My suggestion regarding the change to the manufacturer’s own utilities would more accurately reflect the capabilities of the card and align what the utilities show and what the manufacturer (and by extension Dell) advertise the capacity of the card to be. This information could be displayed as follows:

Dedicated Video Ram = 128MB
Maximum Available System RAM = 128MB
Maximum Graphics Memory Available= 256MB

The maximum available system ram value could be hard-coded value based on the model of the card. This would allow a single software fix to address all models of graphics cards. The amended diagnostic control panels could be pushed to Dell customers as a software update. This is not a difficult fix and would quickly address the root cause of the issues at hand. If the diagnostic utility currently installed had shown a ‘memory audit’ like the one above I wouldn’t have raised the support issue in the first place and my blog would have been a quieter place for the last few months.

By increasing the completeness of the information, the accuracy of it improves and the risk of consumers such as myself from raising support cases and pursuing issues which, ultimately, are a result of poor quality information leading to a failure in clear communication as to what the capability of the card is and what the purchaser’s expectation should be.

Personally, I feel that this technology is a fudge and the way the information about the capability of the cards is presented by the manufacturers is misleading. I hope that Dell take this opportunity to implement simple changes to improve the quality of information.

The business case for these changes can be determined easily by Dell based on the number of support cases raised, the length of time/amount of resources expended on investigating and dealing with these cases and the costs of any replacement cards shipped to customers. This is the cost of non-quality.

The benefit to Dell of reducing the risk of confusion is the savings that would result through a reduction in these types of support calls. The return on investment would be straightforward to calculate from there, however based on my experience in information quality management I would suggest that the costs to Dell of the three remediation actions I have suggested would be far less than the costs of service issues arising simply from poor quality information.

The Information Quality lessons that I would suggest people take from this saga:
1. Poor Information Quality can impact all processes
2. The actions that can be taken to prevent Information Quality problems are often simple, straightforward and easy to implement. The key factor is to focus on the customer and determine what steps need to be taken to ensure your processes and information are meeting or exceeding their expectations
3. Speak with Data– when I posted the screen shot from the graphic card utility I provided information to Dell (and to the world) about what I was seeing and the basis on which I felt there was a problem. This then allowed John to validate what I was saying, and he responded in kind with detailed information (including links to wikipedia and the footnote in the on-line Dell documentation). This enabled clear, accurate and effective communication based the facts, not anecdote or hearsay and lead to me being happy to close the issue.
I promised John I would eat some humble pie. I was wrong in my belief that the graphics card that was installed in my laptop was not the spec that was ordered. I am grateful to John and those in Dell who tried to resolve the issue.

However the fact that the issue arose in the first place has at its root the quality of information about the graphics card and its capability. The fact that the issue dragged on for 5 months is, in part, due to the fact that it seemed that there was a lack of information within some areas of Dell about what the capability of the card was and what the situation actually was and a failure to effectively communicate this.

And John’s explanation doesn’t address why the first replacement card that was shipped to me for my laptop was a graphics card for a desktop…

….that still makes me chuckle in bemusement.
August 8, 2007