Category: The Business of IQ

A category to collect and collate posts on the business aspects of information quality. Will be used to create a pre-defined view of pages in menu bar.

  • Imitation the sincerest form of flattery

    I noticed that Informatica have launched a new website called www.doyoutrustyourdata.com, to highlight issues with poor quality information from the media.

    My personal opinion on the site is that it isn’t very nice looking (but then I’m not a big fan of black on green). However, I’m biased as I moderate the IQTrainwrecks.com blog for the IAIDQ which has been doing this for over 2 years now in an occasionally tongue in cheek manner. IQTrainwrecks.com gets reasonably good search returns on google (and we’re looking at ways to improve that further).

    I’m flattered that Informatica have stumbled upon the same idea that the IAIDQ had back in 2006. I hope that we can figure out a way to have both sites working together for the benefit of information consumers everywhere. For example, the IAIDQ would love to reward members for submitting stories to IQTrainwrecks.com but our resources aren’t extensive enough to fund that (yet).

    [Update] As Vincent McBurney correctly points out, the IAIDQ wasn’t the first to try to create a resource like this. IQTrainwrecks is a spiritual descendant of www.dataquality.com and also the listing of issues that Tom Redman has been tracking over on www.navesink.com). [/update]

  • How not to handle a customer…

    So, I’ve been having problems with my broadband. Problems significant enough that I would suggest that the Dept of Comms actually think through the potential reliance on Fixed Wireless solutions for Ireland’s broadband deficit. More on that another time.

    What annoys me in the immediate sense is the level of customer service that people seem to think is OK. I had my FWA antenna removed from my house today. I found out about it when I looked out the window and saw the van from my provider in the drive way and the legs of a ladder going up the side of the house. I expected a binglybong on the door bell to let me know what was happening, but nowt. I was working so I couldn’t rush out to talk to the man. By the time I’d finished the work stuff he’d vanned away again.

    I’d complained to my provider in writing back in May about some issues. I got a nice email addressing part of my complaint and bugger all else. After this morning’s visitation I emailed them to find out what was going on.

    Apparently they’ve tried to contact me “numberous” [sic] times over the past month to talk to me about the problems I was having.

    Checked email… nowt.
    Checked spam filter… nowt.
    Checked missed calls on phone… nowt.
    Checked the drawer in the kitchen where all the things that look like bills get hidden… nowt.

    I know I had no voicemails from them on the phone as I would have remembered it (and I would have downloaded the voicemail from the webmail service provided by my mobile service provider -betcha didn’t know you could do that did you, unified messaging almost – and put it in the folder of documents/evidence I am compiling to go with my inevitable ComReg complaint).

    Apparently the only contact information they have for me is my mobile number. Apart from the fact they’ve sent me emails to my email address and a man-in-a-van could find my house, where letters also go. And I included all of that information again on my complaint letter.

    So the lack of a follow up email, or a letter responding to my complaint or a friendly binglybong on the doorbell from the man in the van to fill me in on things were all beyond them, because they didn’t have the information. Which they, errmmmm, had, for the reasons mentioned above.

    So that thing about only having a mobile number to contact me is a… [mistake] [lie] [cop out] [failure of internal processes to properly manage customer information]… (select one or more options as appropriate).

    It would seem it’s all my fault I didn’t know what was going on. I should have felt the disturbance in The Force, as if a small call centre of people suddenly cried out as one and then suddenly felll silent. Curse my failing and fading Jedi skills.

    At least that’s how I’d feel if I wasn’t so peeved at the whole thing. I think that once I’ve updated ComReg with the nonsense I’m dealing with I’ll send my ex-provider a request for all personal information they hold about me (electronic and paper file, and ip and traffic logs etc. ) under the terms of the Data Protection Act. ‘Coz I am fond of my regulatory frameworks and codes of practice etc.

    Notice that I’ve not named the service provider or discussed the specific issues here. That would be unfair to my (it would seem former – at their initiative) Broadband Provider. However, they are exactly the type of organisation that DCENR seems to be pinning the Great Broadband Hope on.

    The good news is that the Vodafone broadband dongle I have for using while commuting and which has been my main tool for getting on line at home recently – even though it is just 2G around these parts, picked up a 3 3G network last night. Couldn’t connect to it but knew it was there. So that’s got me thinking….

  • An IQ Trainwreck…

    From Don Carlson, one of my IAIDQ cronies in the US comes this YouTube vid from Informatica (a data quality software tool vendor) that sums up a lot of why Information Quality matters.

    Of course, I could get snooty and ask what gave them the idea to juxtapose Information Quality and Trainwrecks…. gosh, I’d swear I’ve seen that somewhere before

  • The Electoral Register Hokey-Cokey

    When I was a small child, my grandmother used to entertain me and my siblings by getting us to sing and dance the hokey cokey, a playful little song and dance routine if ever there was one.

    This dance was brought to mind yesterday when Fergal of the Tuppenceworth bloggers emailed me to let me know that he appears to have been taken off the Electoral Register in his home county. Again.

    You put your right to self-determination and election of a government by proportional representation as mandated by the constititution of the Irish Republic in.
    You put your right to self-determination and election of a government by proportional representation as mandated by the constititution of the Irish Republic out.
    In. Out. In. Out.
    And you shake it all about.

    It would seem that Fergal had been taken off the Register during the Great Clean up of 2006. He then had his ballot reinstated. The other day, in a fit of electoral existentialism he decided to try and find himself on the Electoral Register website www.checktheregister.ie

    Zen like, he found himself encountering the concept of nothing as a search for his name at his address revealed nothing. Oh Hokey Cokey Cokey indeed.

    So what may have gone wrong here?

    • Is Fergal’s name transposed on the Register (surname first, firstname last)?
    • Is the address registered against Fergal on the Register different to his address?
    • Does the search function on the Electoral Register require an exact character match on names/addresses? Is “Fergal” interpreted as a different name to “Fearghal” (both Fergals in my book)?
    • If Fergal has indeed been deleted from the Register (again), what triggered the Hokey Cokey here? Was an old copy of the Register loaded to the website?
    • Is the version you search on-line up to date with the version you might find in your library or Garda Station? Might Fergal be on the Register, but just not on the Register that is searched? Might it work in the contrary… Might people be listed as ‘on the register’ in an on-line search but be off the Register in the ‘paper’ world (ie the version that counts on polling day)?

    The list of potential root causes is (especially as I am speculating a bit) quite long. However this is further evidence that the processes for the management of the Electoral Register are a bit knackered. This has been accepted by the Government and the Oireachtas Committee on the Electoral Register recently published a series of recommendations which eerily echoed comments and recommendations made on this blog over 2 years ago.

    However, while there is an urgent need to have as accurate an electoral register as possible (1 Referendum in our immediate future and Local Elections in the not to distant future), care must be taken to ensure we solve the problems of tomorrow as well as the problems of today.

    But in the words of Tom Jones – “I think I’m gonna dance now”…

    “Oh, hokey cokey cokey…. Oh hokey cokey cokey…..”

  • Telephone numbers and Information Quality – the risk of assumption

    There is an old saying that the word “Assume” makes an “Ass” out of “You” and “Me”.

    Yet we see (and make) assumptions every day when it comes to assessing the quality (or otherwise) of information. Anglo-Saxon biassed peoples (US, English speaking Europe etc) often assume that names are structured Firstname Surname. “Daragh” = First Name, “O Brien” = Surname. The cultural bias here is well documented by people like Graham Rhind (who advises the use of “Given Name/Family Name” constructs on web forms etc. to improve cross-cultural usability.

    But what if you see “George Michael” written down (without the context of labels for each name part) with a reference to “singer”? Would this relate to the pop singer George Michael, or the bass baritone singer Michael George?

    One of the common ‘rules of thumb’ with telephone numbers is that, when you are trying to create the full ‘internationalised’ version of a telephone number (+[international access code] [local area code] [local number]) you take the number as written ‘locally’ and drop the leading zero. Of course, like most conventional wisdom a little scrutiny causes this rule of thumb to fall apart.

    For example, in the Czech Republic there is no ‘leading zero’ as it is actually part of the international access code (which actually makes more sense to me…). One might assume that Europe, with the standardisation ethos of the European Union would all have plumped for “0” as a leading digit on local area codes. Not so, as Portugal doesn’t use any leading digit on their area codes. Some countries that used to be part of the USSR (like Russia, Belarus and Azerbijan) use 8 instead of 0.

    You might not be safe in assuming that you just need to consider the first digit of the local area code. Hungary has a 2-digit prefix (06), so you would need to parse in 2 characters in the string to remove the correct digits. Just stripping the leading zero will result in a totally embuggered piece of information.

    Also, everyone assumes that a telephone number will consist only of numbers. However, there are a few instances where the code required to dial out from a country (the International Direct Dial code) is actually alphanumeric in that it contains either the * (star) or # (hash key/pound key). Our buddies in Belarus are an example of this, where to dial out from Belarus you need to dial “8**10” (which even more confusingly is often written “8~10”.
    So what does this mean for people who are assessing or seeking to improve the quality of telephone number data in their systems?

    Well, first off it means you need to have some context to understand the correct business rules to apply. For example, the rules I would apply to assessing the quality (and likely defects) in a telephone number from Ireland would be different to what I’d need to apply to telephone numbers relating to Belarus. In an Irish telephone number it would be correct to strip out instances of “**” and then validate the rest of the string based on its length (if stripping the ** made it too short to be a telephone number then we would need to tag it as duff data and remove it). With data relating to Belarus it might simply be that the person filling in the form (the source of the data) got confused about what codes to use.

    Secondly, it means you need to put some thought into the design of information capture processes to reduce the chances of errors occuring. Defining a structure with seperate fields, linking the international access code to a country drop down (and a library of business rules for how to interpret and ‘standardize’ subsequent inputs) would not be too difficult – it would just require investment of effort in researching the rules and maintaining them once deployed. Here’s a link to a useful resource I’ve found (note that I can’t vouch for the frequency of updates to this site, but I’ve found it a fun way to figure out what the rules might be for various countries). Also, Wikipedia has a good piece on Telephone number plans. Graham Rhind also has some good links to references for telephone number format rules
    Looking at the data of a telephone number in isolation will most likely result in you screwing up some of the data (if you have international telephone number). Having the country information for that data (is the number in France or Belarus) allows you to construct appropriate rules and make your assumptions in the appropriate context to reduce your risks of error.

    Ultimately, blundering in with a crude rule of thumb and simply stripping any leading zeros you find because that is the assumption you’ve made will result in you making an ass out of you and your data.

    Which raises an interesting question…

    Imagine you have been given a spreadsheet of telephone numbers that you have been told are international numbers in the ‘local’ formats for the respective countries. You open the spreadsheet and there are no leading zeros (because Excel -and most other spreadsheets- assumes that numbers don’t begin with zero and strip it out). What to you do to get the data back to a format that you can actually use?

    Answers on a post card (or in the comments) please.

  • Cripes, the blog has been name-checked by my publisher…

    TwentyMajor isn’t the only blogger in the pay of a publisher (I’m conveniently ignoring Grandad and the others as Irish bloggers are too darned fond of publishing these days. If you want to know who all the Irish bloggers with publishers are then Damien Mulley probably has a list)!

    I recently wrote an industry report for a UK publisher on Information Quality strategy. The publisher then swapped all my references to Information Quality to references to Data Quality as that was their ‘brand’ on the publication. I prefer the term Information Quality for a variety of reasons.

    As this runs to over 100 pages of A4 it has a lot of words in it. My fingers were tired after typing it. Unlike Twenty’s book, I’ve got pictures in mine (not those kind of pictures, unfortunately, but nice diagrams of concepts related to strategy and Information Quality. If you want the other kind of pictures, you’ll need to go here.)

    In the marketing blurb and bumph that I put together for the publisher I mentioned this blog and the IQTrainwrecks.com blog. Imagine my surprise when I opened a sales email from the publisher today (yes, they included me on the sales mailing list… the irony is not lost on me… information quality, author, not likely to buy my own report when I’ve got the four drafts of it on the lappytop here).

    So, for the next few weeks I’ll have to look all serious and proper in a ‘knowing what I’m talking about’ kind of way to encourage people to by my report. (I had toyed with some variation on booky-wook but it just doesn’t work – reporty-wort… no thanks, I don’t want warts).

    So things I’ll have to refrain from doing include:

    1. Engaging in pointless satirical attacks on the government or businesses just for a laugh, unless I can find an Information Quality angle
    2. Talking too loudly about politics
    3. Giving out about rural/urban digital divides in Ireland
    4. Parsing and reformatting the arguments of leading Irish opinion writers to expose the absence of logic or argument therein.
    5. Engaging in socio-economic analysis of the fate of highstreet purveyors of dirty water parading as coffee.
    6. Swearing

    That last one is a f***ing pain in the a**.

    If any of you are interested in buying my ‘umble little report, it is available for sale from Ark Group via this link.. . This link will make them think you got the email they sent to me, and you can get a discount, getting the yoke for £202.50 including postage and packing (normally £345+£7.50p&p. (Or click here to avoid the email campaign software…)

    And if any of you would like to see the content that I’d have preferred the link in the sales person’s to send you to (coz it highlights the need for good quality management of your information quality) then just click away here to go to IQTrainwrecks.com

    Thanks to Larry, Tom, Danette, the wifey for their support while I was writing the report and Stephanie and Vanessa at Ark Group for their encouragement to get it finished by the deadline.

  • More thoughts on the IBTS data breach

    One of the joys of having occasional bouts of insomnia is that you can spend hours in the dead of night pondering what might have happened in a particular scenario based on your experience and the experience of others.

    For example, the IBTS has rushed to assure us that the data that was sent to New York was encrypted to 256bit-AES standard. To a non-technical person that sounds impressive. To a technical person, that sounds slightly impressive.

    However, a file containing 171000+ records could be somewhat large, depending on how many fields of data it contained and whether that data contained long ‘free text’ fields etc. When data is extracted from database it is usually dumped to a text file format which has delimiters to identify the fields such as commas or tab characters or defined field widths etc.

    When a file is particularly large, it is often compressed before being put on a disc for transfer – a bit like how we all try to compress our clothes in our suitcase when trying to get just one bag on Aer Lingus or Ryanair flights. One of the most common software tools used (in the microsoft windows environment) is called WinZip. It compresses files but can also encrypt the archive file so that a password is required to open it. When the file needs to be used, it can be extracted from the archive, so long as you have the password for the compressed file. winzip encryption screenshot.
    So, it would not be entirely untrue for the IBTS to say that they had encrypted the data before sending it and it was in an encrypted state on the laptop if all they had done was compressed the file using Winzip and ticked the boxes to apply encryption. And as long as the password wasn’t something obvious or easily guessed (like “secret” or “passw0rd” or “bloodbank”) the data in the compressed file would be relatively secure behind the encryption.

    However, for the data to be used for anything it would need to be uncompressed and would sit, naked and unsecure, on the laptop to be prodded and poked by the application developers as they went about their business. Where this to be the case then, much like the fabled emperor, the IBTS’s story has no clothes. Unencrypted data would have been on the laptop when it was stolen. Your unencrypted, non-anonymised data could have been on the laptop when it was stolen.

    The other scenario is that the actual file itself was encrypted using appropriate software. There are many tools in the market to do this, some free, some not so free. In this scenario, the actual file is encrypted and is not necessarily compressed. To access the file one would need the appropriate ‘key’, either a password or a keycode saved to a memory stick or similar that would let the encryption software know you were the right person to open the file.

    However, once you have the key you can unencrypt the file and save an unencrypted copy. If the file was being worked on for development purposes it is possible that an unencrypted copy might have been made. This may have happened contrary to policies and agreements because, sometimes, people try to take shortcuts to get to a goal and do silly things. In that scenario, personal data relating to Irish Blood donors could have wound up in an unencrypted state on a laptop that was stolen in New York.

    [Update**] Having discussed this over the course of the morning with a knowledgable academic who used to run his own software development company, it seems pretty much inevitable that the data was actually in an unencrypted state on the laptop, unless there was an unusual level of diligence on the part of the New York Blood Clinic regarding the handling of data by developers when not in the office.

    The programmer takes data home of an evening/weekend to work on some code without distractions or to beat a deadline. To use the file he/she would need to have unencrypted it (unless the software they were testing could access encrypted files… in which case does the development version have ‘hardened’ security itself?). If the file was unencrypted to be worked on at home, it is not beyond possiblity that the file was left unencrypted on the laptop at the time it was stolen.

    All of which brings me back to a point I made yesterday….

    Why was un-anonymised production data being used for a development/testing activity in contravention to the IBTS’s stated Data Protection policy, Privacy statement and Donor Charter and in breach of section 2 of the Data Protection Act?

    If the data had been fake, the issue of encryption or non-encryption would not be an issue. Fake is fake, and while the theft would be embarrassing it would not have constituted a breach of the Data Protection Act. I notice from Tuppenceworth.ie that the IBTSB were not quick to respond to Simon’s innocent enquiry about why dummy data wasn’t used.

  • Fair use/Specified purpose and the IBTS

    I am a blood donor. I am proud of it. I have provided quite a lot of sensitive personal data to the IBTS over the years that I’ve been donating.

    The specific purposes for which I believed I was providing the information was to allow the IBTS to administer communications with me as a donor (so I know when clinics are on so I can donate), to allow the IBTS to identify me and track my donation patterns, and to alert IBTS staff to any reasons why I cannot donate on a given occasion (donated too recently in the past, I’ve had an illness etc.). I accepted as implied purposes the use of my information for internal reporting and statistical purposes.

    I did not provide the information for the purposes of testing software developed by a 3rd party, particularly when that party is in a foreign country.

    The IBTS’s website (www.ibts.ie) has a privacy policy which relates to data captured through their website. It tells me that

    The IBTS does not collect any personal data about you on this website apart from information which you volunteer (for example by emailing us or by using our on line contact forms). Any information which you provide in this way is not made available to any third parties, and is used by the IBTS only for the purpose for which you provided it.

    So, if any information relating to my donor record was captured via the website, the IBTS is in breach of their own privacy policy. So if you register to be a donor… using this link… http://www.ibts.ie/register.cfm?mID=2&sID=77 then that information is covered by their Privacy policy and you would not be unreasonable in assuming that your data wouldn’t wind up on a laptop in a crackhouse in New York.

    In the IBTS’s Donor Charter, they assure potential Donors that:

    The IBTS guarantees that all personal information about donors is kept in the strictest confidence

    Hmm… so no provision here for production data to be used in testing. Quite the contrary.

    However, it gets even better… in the Donor Information Leaflet on the IBTS’s website, in the Data Protection section (scroll down… it’s right at the bottom), current and potential donors the IBTS tells us that (emphasis is mine throughout):

    The IBTS holds donor details, donation details and test results on a secure computerised database. This database is used by the IBTS to communicate with donors and to record their donation details, including all blood sample test results. It is also used for the proper and necessary administration of the IBTS. All the information held is treated with the strictest confidence.

    This information may also be used for research in order to improve our knowledge about the blood donor population, and for clinical audit, to assess and improve the quality of our service. Wherever possible, all such information will be anonymised.

    Right.. so from their policy and their statement of fair use and specified purposes we learn that:

    1. They can use it for communication with donors and for tracking donation details and results of tests (as expected)
    2. They can use it for necessary administration. Which covers internal reporting but, I would argue, not giving it to other organisations to lose on their behalf.
    3. They can use it for research about the blood donor population, auditing clinical practices. This is OK… and expected.
    4. They are also permitted to use the data to “improve the quality of [their] service”. That might cover the use of the data for testing…

    Until you read that last bit… the data would be anonymised whenever possible. That basically means the creation of dummy data as described towards the end of my last post on this topic.

    So, the IBTS did not specify at any time that they would use the information I had provided to them for the purposes of software development by 3rd parties. It did specify a purpose for using the information for the improvement of service quality. But only if it was anonymised.

    Section 2 of the Data Protection Act says that data can only be used by a Data Controller for the specific purposes for which it has been gathered. As the use of un-anonymised personal data for the purposes of software development by agencies based outside of the EU (or in the EU for that matter) was not a specified use, the IBTS is, at this point, in breach of the Data Protection Act. If the data had been anonymised (ie if ‘fictional’ test data had been used or if the identifying elements of the personal data had been muddled up before being transferred) there would likely be no issue.

    • Firstly, the data would have been provided in a manner consistent with the specified use of the data
    • Secondly, there would have been no risk to personal data security as the data on the stolen laptop would not have related to an identifiable person in the real world.

    Of course, that would have cost a few euros to do so it was probable de-scoped from the project.

    If I get a letter and my data was not anonymised I’ll be raising a specific complaint under Section 2 of the Data Protection Act. If the data was not anonymised (regardless of the security precautions applied) then the IBTS is in breach of their specified purposes for the collection of the data and are in breach of the Data Protection Act.

    Billy Hawkes, if you are reading this I’ve just saved your team 3 weeks work.

  • Irish Blood Transfusion Service loses data..

    Why is it that people never learn? Only months after the debacle of HMRC sending millions of records of live confidential data whizzing around in the post on 2 CDs (or DVDs), the Irish Blood Transfusion Service (IBTS) has had 171,000 records of blood tests and blood donors stolen.

    The data was on a laptop (bad enough from a security point of view). The data was (apparently) secured with 256bit AES encryption (happy days if true). The laptop was taken in a mugging (unfortunate). The mugging took place in New York (WTF!?!?)

    Why was the data in New York?
    It would seem that the IBTS had contracted with the New York Blood Centre (NYBC) for the customisation of some software that the NYBC had developed to better manage information on donors and blood test results. To that end the IBTS gave a copy of ‘live’ (or what we call in the trade ‘production’) data to the NYBC for them to use in developing the customisations.

    So, personal data, which may contain ‘sensitive’ data relating to sexual activity, sexual behaviour, medicial conditions etc. was sent to the US. But it was encrypted, we are assured.

    A quick look at the Safe Harbor list of the US Dept of Commerce reveals that the NYBC is not registered as being a ‘Safe Harbor’ for personal data from within the EU. Facebook is however (and we all know how compliant Facebook is with basic rules of data protection).

    Apparently the IBTS relied on provisions of their contract with the NYBC to ensure and assure the security of the data relating to REAL people. As yet no information has come to light regarding whether any audits or checks were performed to ensure that those contractual terms were being complied with or were capable of being complied with.

    How did the data get to New York?
    From the IBTS press release it is clear that the data got to New York in a controlled manner.
    An employee of NYBC took the disc back from Ireland and placed it in secure storage.

    Which is a lot better than sticking two CDs in the post, like the UK Revenue services did not so long ago.

    What about sending the data by email? Hmmm… nope, not secure enough and the file sizes might be to big. A direct point to point FTP between two servers? that would work as well, assuming that the FTP facilities were appropriately secured by firewalls and a healthy sense of paranoia.

    Why was the data needed in New York?
    According to the Irish Times

    The records were in New York, the blood service said, “because we are upgrading the software that we use to analyse our data to provide a better service to donors, patients and the public service”.

    Cool. So the data was needed in New York to let the developers make the necessary modifications to code.

    Nice sound bite. Hangs together well. Sounds reasonable.

    Unfortunately it is total nonsense.

    For the developers to make modifications to an existing application, what was required in New York was

    • A detailed specification of what the modifications needed to be to enable the software to function for Irish datasets and meet Irish requirements. Eg. if the name/address data capture screens needed to change they should have been specified in a document. If validation routines for zip cods/postcodes needed to be turned off, that should have been specified. If base data/reference data needed to be change – specify it in a document. Are we seeing a trend here?
    • Definition of the data formats used in Ireland. by this I mean the definition of the formats of data such as “social security number”. We call it a PPSN and it has a format nnnnnnnA as opposed to the US format which has dashes in the middle. A definition of the data formats that would be used in Ireland and a mapping to/from the US formats would possibly be required… this is (wait for it) another document. NOT THE DATA ITSELF
    • Some data for testing. Ok, so this is why all 171000+ records were on a laptop in New York. ehh… NO. What was required was a sample data set that replicates the formats and patterns of data found in the IBTS production data. This does not mean a cut of production data. What this means is that the IBTS should have created dummy data that was a replica of production data (warts and all – so if there are 10% of their records that have text values in fields where numbers would be expected, then 10% of the test data should reflect this). The test data should also be tied to specific test cases (experiments to prove or disprove functionality in the software).

    At no time was production data needed for development or developer testing activities in New York. Clear project specification and requirements documentation, documents about data formatting and ‘meta-data’ (data about data), Use Cases (walk throughs of how the software would be used in a given process – like a movie script) and either a set of dummy sample data that looks and smells like you production data or a ‘recipe’ for how the developer can create that data.

    But the production data would be needed for Acceptance testing by IBTS?
    eh… nope. And even if it was it would not need to be sent to New York for the testing.

    User Acceptance testing is a stage of testing in software development AFTER the developer swears blind that the software works as it should and BEFORE the knowledge workers in your organisation bitch loudly that the software is buggered up beyond all recognition.

    As with all testing it does not require a the use of production data is not required, and indeed is often a VERY BAD IDEA (except in certain extreme circumstances such as the need for volume stress testing or testing of very complex software solutions that need data that is exactly like production to be tested effectively… eg. a complex parsing/matching/loading process on a multi-million record database – and even at that, key data not relevant to the specific process being tested ought to be ‘obscured’ to ensure data protection compliance ).

    What is required is that your test environment is as close a copy to the reality you are testing for as possible. So, from a test data point of view, creating test data that looks like your production data is the ideal. One way is to do data profiling, develop an understanding of the ‘patterns’ and statistical trends in your data and then hand carve a set of test data that looks and smells like your production data but is totally fake and fraudulent and safe. Another approach is to take a copy of your production data and bugger around with it to mix names and addresses up, replace certain words in address data with different words (e.g. “Park” with “Grove” or “Leitrim” with “Carialmeg” or “@obriend.info” with “obriend.fakedatapeople” – whatever works). So long as the test data is representative of the structure and content of your production data set and can support the test scenarios you wish to perform then you are good to go.

    So, was the production data needed in New York – Nope. Would it be needed for testing in a test event for User Acceptance testing? Nope.

    And who does the ‘User Acceptance testing’? Here’s a hint… whats the first word? User Acceptance testing is done by representatives of the people who will be using the software. They usually follow test scripts to make sure that specific functionality is tested for, but importantly they can also highlight were things are just wrong.

    So, were there any IBTS ‘users’ (knowledge workers/clerical staff) in New York to support testing? We don’t know. But it sounds like the project was at the software development stage so it is unlikely. So why the heck was production data being used for development tasks?

    So… in conclusion
    The data was stolen in New York. It may or may not have been encrypted (the IBTS has assured the public that the data was encrypted on the laptop… perhaps I am cynical but someone who takes data from a client in another nation home for the weekend might possibly have decrypted the data to make life easier during development). We’re not clear (at this point) how the data got to New York – we’re assuming that an IBTS employee accompanied it to NY stored on physical media (the data, not the employee).

    However, there is no clear reason why PRODUCTION data needed to be in New York. Details of how the IBTS’s current data formats might map to the new system, details of requirements for changes to the NYBC’s current system to meet the needs of the IBTS, details of the data formats in the IBTS’s current data sets (both field structues and, ideally, a ‘profile’ of the structure of the data and any common errors that occur) and DUMMY data might be required for design, development and developer testing are all understandable. Production data is not.

    There is no evidence, other than the existence of a contractual arrangement, that the NYBC had sufficient safeguards in place to ensure the safety of personal data from Ireland. The fact that an NYBC employee decided to take the data out of the office into an unsecure environment (down town New York) and bring it home with them would evidence that, perhaps, there is a cultural and procedural gap in NYBC’s processes that might have meant they either couldn’t comply or didnt’ understand what the expectation of the clauses in those contracts actually meant.

    For testing, what is required is a model of production. A model. A fake. A facsimile NOT PRODUCTION. The more accurate your fake is the better. But it doesn’t need to be a carbon copy of your production data with exactly the same ‘data DNA’… indeed it can be a bad idea to test with ‘live’ data. Just like it is often dangerous to play with ‘live’ grenades or grab a ‘live’ power line to see what will happen.

    The loss of our IBTS data in New York evidences a failure of governance and a ‘happy path’ approach to risk planning, and a lack of appreciation of the governance and control of software development projects to ensure the protection of live data.

    As this was a project for the development of a software solution there was no compelling reason that I can identify for production data to have been sent from Ireland to New York when dummy data and project documentation would have sufficed.

    The press release from the IBTS about this incident can be found here..

    [UpdateSimon over at Tuppenceworth has noted my affiliation to the IAIDQ. Just to clarify, 99% of this post is about basic common sense. 1% is about Information Management/Information Quality Management. And as this post is appearing here and not on the IAIDQ’s website it goes without saying that my comments here may not match exactly the position of the IAIDQ on this issue. I’m also a member of the ICS, who offer a Data Protection certification course which I suspect will be quite heavily subscribed the next time it runs.]

    [Update 2: This evening RTE News interviewed Dr David Gray from DCU who is somewhat of an expert on IT security. The gist of Dr Gray’s comments were that software controls to encrypt data are all well and good, but you would have to question the wisdom of letting the information wander around a busy city and not having it under tight physical control… which is pretty much the gist of some of my comments below. No one has (as yet) asked why the hell production data rather than ‘dummy’ data was being used during the development phase of a project.]

  • Getting back to my Information Quality agenda

    One or two of the comments (and emails) I received after the previous post here were enquiring about some stuff I’d written previously (2006 into 2007) about the state of the Irish Electoral Register.

    It is timely that some people visited those posts as our Local Elections are coming up in less than 18 months (June 2009) and frankly, unless there is some immense effort going on behind the scenes that I haven’t heard of, the Register is still in a poor state.

    The issue isn’t the Register per se but the processes that surround it, the apparent lack of a culture where the leadership take the quality of this information seriously enough to make the necessary changes to address the cultural, political and process problems that have resulted in it being buggered.

    There are a few consolidating posts knocking around on this blog as I’ve pulled things together before. However a quick search for “Electoral Register” will pull all the posts I’ve done on this together. (If you’ve clicked the link all the articles are presented below).

    I’ve also got a presentation on the subject over at the IQNetwork website, and I did a report (which did go to John Gormely’s predecessor) which can be found here, and I wrote Scrap and Rework articlethat I submitted to various Irish newspapers at the time to no avail but which has been published internationally (in print and on-line).

    At this stage, I sense that as it doesn’t involve mercury filled CFLs or Carbon taxes, the state of the electoral register and the legislative framework that surrounds it (a lot of the process issues require legislative changes to address them) has slipped down the list of priorities our Minister has.

    However, with Local Elections looming it is important that this issue be addressed.