Data Protection Rake: WHACK!!

Sideshowbob walking on rakesSo, the Minister for Education is fighting a rear-guard action to justify the method of execution of the Primary Online Database. Get ready for the rakes.

Correctly, she is stressing the need for a means to track education outcomes as children move from primary to secondary education, where there is a drop-out rate which is rightly concerning. It’s been concerning since 2006 when Barnados highlighted the mystery of what was happening to the 1000 children a year who didn’t progress from primary to secondary education.

She has stated that the Data Protection Commissioner has been consulted and “and that office is satisfied with what we are doing“. The Data Protection Commissioner has commented that the Department has presented “a legitimate and proportionate purpose for requesting to be provided with the data it is seeking“. Now… that’s not the same thing as being “satisfied with what we are doing” as the Minister has said. It also depends very much on what purpose was communicated to the Office of the Data Protection Commissioner in 2013.

Even in an ideal world scope creep occurs, particularly when the objective for processing the data seems to be a bit confused. Is it for purely statistical purposes (which is implicit in the statements that the data would only be accessed by a small number of people in the statistics unit of the Department of Education), or is it for more day-to-day operational decision making purposes (which is implicit in comments made by the Minister that school funding could be at risk if data was not returned)? Those are two different categories of purpose.

[Whack]

But what about the DPC’s position?

The Data Protection Commissioner’s statement to the Irish Times actually limits its comment to the legitimacy and proportionality of the purpose that the Department may have for seeking to process this data. Ensuring children move from Primary to Secondary education and ensuring that the State has data available to help identify any trends in drop-out rates and ensure that limited resources are deployed as efficiently as possible to ensure equality of access to education (here’s a link to some more stuff from Barnardo’s on that) and support children in getting the best education outcomes possible.

Legitimacy and proportionality are linked to the purpose for which the data is being obtained. And the need to ensure that data is “Obtained fairly and processed for a specified and lawful purpose” it is just the first two of eight Data Protection principles. So what is the purpose the DPC was told about? Are there new purposes?

So, when the Minister comments on the retention of data about primary school children until they are 30 years old, and says that

“I did say I would examine it but it looks to me that up to the 30th birthday is probably appropriate and it satisfies the Data Commissioner as well which is obviously very important,”

it is really important to ask: What is the purpose for which this long a retention period is required?

[Whack]

It’s actually more than that: it’s essential that the Minister is able to say categorically what the purpose is for this retention and why a 25 to 26 year retention period for personal and sensitive personal data is required (“probably appropriate” is not the test… “retention for no longer than is necessary for the purpose for which the data is being processed” is the test under the Data Protection Acts. It is also important to assess whether the purpose and requirement can be met by less personally identifying data: would anonymised or pseudonymised data support the objective? If yes, then it ceases to be necessary to hold the raw data, so it is no longer “probably appropriate”).

[Whack]

So… what is the specific purpose for which a retention period of “until 30th birthday” is required? State it. Assess it. Compare against other alternative methods. And then make a clear decision based on the Privacy impact and the necessity and proportionality of the processing. “Probably appropriate” is not a form of words that fills me with confidence. “Assessed to be necessary and proportionate against other options, which were rejected because of X, Y, Z reasons” would be more illustrative and evidential of a proper Privacy Impact Assessment and Privacy by Design thinking at work.

[Whack]

For other purposes it might not be appropriate to allow access to the identifiable data even 90 seconds after it is recorded. Those purposes need to be identified and appropriate governance and controls defined and put in place to ensure only appropriate data is disclosed that is adequate, relevant, and not excessive to the purpose for which it is being processed. And that purpose needs to be consistent with and not incompatible with the purpose. The Data Protection Commissioner doesn’t appear to have actually commented on that. So the standard protocol of clear statutory basis and an appropriate system of Governance still needs to be considered and put in place for any sharing of data or subsequent use of data to be compliant with the Data Protection Acts (and, just in case we forget, Article 8 of the EU Charter of Fundamental Rights).

[Whack]

Disturbingly, the Minister seems to imply that it is irrelevant if parents provide their PPSN to the Department or not as they will be able to obtain that data from the Department of Social Protection. It is true that name, address, date of birth and mother’s maiden name can be used to validate a PPSN. However I would question the  basis under which the passing of that data to obtain the PPSN would be valid, given that the Dept of Education’s registration with Client Identity Services in the DSP seems to presume the Department has the PPSN it needs.The rent has been paid up on the battlefield it appears, and there is no going back.

[Whack. Whack]

(Name, address, date of birth, and mother’s maiden name could form a composite key to identify a child uniquely on the database where no PPSN is available. In which case, what is the purpose for the PPSN?)

[Whack]

What does the Minister’s statement mean?

In my opinion, the Minster’s statement means that the Department are mis-understanding the role of the Data Protection Commissioner and what it means for the DPC to give an opinion on the appropriateness of processing. The DPC will determine if there is risk of non-compliance with a proposed purpose for processing and will give guidance and feedback based on the information that is provided to them.

If that information is incomplete, or doesn’t match the final implementation of a system, then the DPC can (and does) change their position. It’s also not the role of the DPC to correct the homework of a Government Department, and the new Commissioner Helen Dixon has made that exceptionally clear to Public sector representatives in at least two forums since November. Her role is to enforce the legislation and support the protection of fundamental data privacy rights of individuals and to be independent of Government (that’s a Treaty obligation by the way since 2009… and towards the end of his term Billy Hawkes the former Commissioner exercised that independence by, for example, prosecuting the Minister for Justice).

It also means that the Minister is at risk of having to dig herself out of an entrenched position. The road to heck is paved with good intentions. This scheme (and all the other education outcome tracking databases that the Department has) are all valid and valuable as part of a coherent information strategy for the design and implementation of education services and delivery of education outcomes in Ireland. But the design and execution of the systems of processing (not just the technology systems but the wider scheme of stakeholder engagement, controls, governance, and impact assessments) is leaving a lot to be desired.

It means, unfortunately, that rather than display their homework around Privacy Impact Assessment, Governance controls, and Privacy by Design, the Minister and her Department are reacting exactly as I described in yesterday’s blog post:

Data Protection Expert: I think this raises significant issues and may be illegal

Government Representative: It’s too late. I’ve already paid a months’ rent on the PR agency project.

So far the report card reads:

  • Intention: 10 /10
  • Effort: 4 /10 for effort.
  • Execution:  2 / 10  (and negative marking applies here).

“Trust us, we’re the Government” doesn’t work any more because the Government has failed spectacularly to build and engender trust on previous data gathering and data sharing initiatives. So, laudable as the goals are, there was already a mountain to climb to put this data gathering inside the “circle of trust”.

My €0.02

Having reviewed a range of documentation around the Primary Online Database (including the specifications for the drop down fields in the database).

  1. The project has mis-identified as “non-sensitive” data a range of questions which are capturing sensitive personal data about medical or psychological assessments.
  2. The system has a notes field which currently can be accessed by users of the system in the Department but it is proposed that that will be restricted to just schools but in reality that means that the data is still being stored on a system designed and controlled by the Department and which would be accessible by anyone with an administrator access to the underlying database.
  3. The communication of purpose for processing, and the explanation of the retention period, is bordering on the unintelligible to me. And I read and write those kind of things for a living. I teach this stuff to lawyers. The defence that “it’s based on the Department circular” is not a defence. The requirement under the Data Protection Acts is that data be fairly obtained for a specified purpose. That requires that the statement of purpose be comprehensible (I advise clients to apply adult literacy standards to their text and aim for a reading age of 12 to 15). If the circular is incomprehensible, write a ‘friendly version’ or get the Circular redone.
  4. The project has gone to the wrong source for the data. The schools do not have a lot of this data, and even then they have obtained it for a different specified purpose. Schools guessing at ethnicity or religion or other aspects of the data being gathered makes little sense and creates an admin burden for the schools. The 50% response rate in the pilot project should have been a warning that the execution method was not appropriate.
  5. The use of “local” versions of the questionnaire by schools (where schools have modified the Department’s form and sent it out to parents) means that the Department (as Data Controller) has lost control of the statement of and explanation of purposes and processing. That means that no assumptions can be made now about what parents understood they were agreeing to because the ‘official’ form of communication may not have been used.
  6. There is no clear justification for a retention period of raw, identifiable, data until a child’s 30th year.
  7. The stance adopted by the Minister is not good. In the face of valid criticism she has adopted an entrenched position, clutching to the DPC as a shield rather than a fig leaf. Given the narrative arc in the Irish Water debacle that is, as Sir Humphrey Appleby would say, “Courageous Minister, very courageous”. (Data relating to children, “all cleared by the DPC”, challenge in public by knowledgeable experts, public disquiet, “DPC said it was OK”, immediate reverse ferret after a reshuffle… [we are at stage 3 now].)

Pausing. Assessing and defining an appropriate strategy for strategic use of data in education for statistical planning and centralisation of operational data, combined with an appropriate Privacy Impact Assessment that takes in to account recent rulings on necessity and proportionality by the CJEU would be advisable at this time.

Anything else is simply courageous, Minister.

The sound of one bell clapping

Twitter is great. I found myself this evening discussing the psychology of alarms with Rob Karel of informatica. He had tweeted that a car alarm outside his office had been going off for an hour but his brain had filtered it out. This is not an uncommon reaction to bells and alarms and is the reason why I have a monitored alarm system in my home, a fact I will return to later.

Our neuropsychological response to alarms is pretty much the same as our response to any alert to risk. It is influenced by the same basic flaws in information processing in the limbic system of the brain, our “lizard brain”. If the danger is not one we are familiar with and it is not immediate we discount it to the point of ignoring it.

An alarm going off is an alert that something is happening somewhere else to someone or something else. Without a hook to make it personal it is just noise and it fades into the background. In the absence of a direct effect on us we tune out the distraction so our lizard brain can focus on other immediate risks – to us. An alarm = someone else’s stuff at risk.

This is why a measure of data quality needs to have an action plan associated with it so that the people in the organisation can tie the metric to a real affect and put a clear response plan into action. Just as how when a fire alarm goes off we know to go to the nearest exit and leave belongings behind or just as we know that if an oxygen mask drops in front of us on a plane we should tug hard and take care of our own mask first.

There is an alarm stimulus. There is a planned response that makes it personal to us. Alarm, something must be done, this is a thing, let’s do it.

But often Information Quality scorecards are left hanging. The measure of success is the success of measurement. Just as the measure of home security is often whether you have a house alarm. But a ringing alarm that has no action to be called to serves no purpose.

My home has a monitored alarm. If one sensor is triggered I get a phone call to check on me and alert me. If a perimeter sensor and an internal sensor are triggered together I get a call to let me know that there are police en route. Each time the alarm is responded to by a stranger with a planned response. My role is to cry halt at any time, gather data about the incident (was there someone calling to house who forgot alarm code? Is there a key holder on the way?), and generally coordinate the plan’s roll out.

What can we learn from this for how we design DG and IQ strategies? What is your planned response to an alarm bell ringing in your data?

Daisy (chain) cutters needed

Brian Honan (@brianhonan on twitter) has been keeping me (and the omniverse) updated via Twitter about the trials and tribulations of Wired.com columnist Matt Honan who was the subject of a Social Engineering attack on his Amazon, Apple, Gmail, and ultimately twitter accounts which resulted in every photograph he had of his young daughter being deleted, along with a whole host of other problems.

Matt writes about his experience in Wired.com today.

Apart from the salutary lesson about Cloud-based back-up services (putting your eggs in their basket leaves you at the mercy of their ability to recover your data if something goes wrong), Matt’s story also raises some key points about Information Quality and Data Governance and the need to consider Privacy as a Quality Characteristic of data.

Part of the success of the attach on Matt’s accounts hinged on the use of his Credit Card number for identity verification:

…the very four digits that Amazon considers unimportant enough to display in the clear on the web are precisely the same ones that Apple considers secure enough to perform identity verification. The disconnect exposes flaws in data management policies endemic to the entire technology industry, and points to a looming nightmare as we enter the era of cloud computing and connected devices.

So, Amazon view the last four digits as being useful to the customer (quality) so they can identify different cards on their account so they are exposed. But Apple considers that short string of data to be sufficient to validate a person’s identity.

This is a good example of what I call “Purpose Shift” in Information Use. Amazon uses the credit card for processing payments, and need to provide information to customers to help them select the right card. However, in Apple-land, the same string of data (the credit card number) is used both as a means of payment (for iTunes, iCloud etc.) and for verifying your identity when you ring Apple Customer Support.

This shift in purpose changes the sensitivity of the data and either

  • The quality of its display in Amazon (it creates a security risk for other purposes) or
  • The risk of its being relied on by Apple as an identifier (there is no guarantee it has not been swiped, cloned, stolen, or socially engineered from Amazon)

Of course, the same is true of the age old “Security Questions”, which a colleague of mine increasingly calls INsecurity questions.

  • Where were you born?
  • What was your first pet’s name?
  • Who was your favourite teacher?
  • What is your favourite book?
  • What is your favourite sport?
  • Last four digits of your contact phone number?

In the past there would have been a reasonable degree of effort required to gather this kind of information about a person. But with the advent of social media it becomes easier to develop profiles of people and gather key facts about them from their interactions on Facebook, Twitter, etc. The very facts that were “secure” because only the person or their close friends would know it (reducing the risk of unauthorised disclosure) are now widely broadcast – often to the same audience, but increasingly in a manner less like quiet whispers in confidence and more like shouting across a crowded room.

[update: Brian Honan has a great presentation where he shows how (with permission) he managed to steal someone’s identity. The same sources he went to would provide the data to answer or guess “security” questions even if you didn’t want to steal the identity. http://www.slideshare.net/brianhonan/knowing-me-knowing-you)

The use of and nature of the data has changed (which Tom Redman highlights in Data Driven as being one of the Special Characteristics of Information as an Asset). Therefore the quality of that data for the purpose of being secure is not what it once may have been. Social media and social networking has enabled us to connect with friends and acquaintances and random cat photographers in new and compelling ways, but we risk people putting pieces of our identity together like Verbal Kint creating the myth of Kaiser Sose in the Usual Suspects.

Building Kaiser Soze

Big Data is the current hype cycle in data management because the volumes of data we have available to process are getting bigger, faster, more full of variety. And it is touted as being a potential panacea for all things. Add to that the fact that most of the tools are Open Source and it sounds like a silver bullet. But it is worth remembering that it is not just “the good guys” who take advantage of “Big Data”. The Bad Guys also have access to the same tools and (whether by fair means or foul) often have access to the same data. So while they might not be able to get the exact answer to your “favourite book” they might be able to place you in a statistical population that likes “1984 by George Orwell” and make a guess.

Yes, it appears that some processes may not have been followed correctly by Apple staff (according to Apple), but ‘defence in depth’ thinking applied to security checks would help provide controls and mitigation from process ‘variation’. Ultimately, during my entire time working with Call Centre staff (as an agent, Team Leader, Trainer, and ultimately as an Information Quality consultant) no staff member wanted to do a bad job… but they did want to do the quickest job (call centre metrics) or the ‘best job they thought they should be doing’ (poorly defined processes/poor training).

Ultimately the nature of key data we use to describe ourselves is changing as services and platforms evolve, which means that, from a Privacy and Security perspective, the quality of that information and associated processes may no longer be “fit for purpose”.

As Matt Honan says in his Wired.com article:

I bought into the Apple account system originally to buy songs at 99 cents a pop, and over the years that same ID has evolved into a single point of entry that controls my phones, tablets, computers and data-driven life. With this AppleID, someone can make thousands of dollars of purchases in an instant, or do damage at a cost that you can’t put a price on.

And that can result in poor quality outcomes for customers, and (in Matt’s case) the loss of the record of a year of his child’s life (which as a father myself would count as possibly the lowest quality outcome of all).

Lies, damned lies, and statistics

On Monday the 16th January 2012 the Irish Examiner ran a story that purported to have found that 93% of the Irish public “decried” the decision of the Minister for Foreign Affairs to close Ireland’s embassy in the Vatican City State. The article detailed how they had undertaken a review of correspondence released under the Freedom Of Information Act which showed that 93% of people in Ireland were against the closure. To cap it off, the article was picked up in the Editorial as well.

Except that that isn’t what they had uncovered. The setting out of the statistics they had found in the sensationalised way they presented them was a gross distortion of the facts. A distortion that would, to paraphrase Winston Churchill, “be half way around the world before the truth had its boots on”).

Demotivational poster about data

What they had uncovered is that of the 102 people who wrote in to the Minister for Foreign Affairs about the issue, 93% of them expressed a negative opinion about the closure. The population of Ireland is approximately 4.5 million people. 95 people is closer to 0.000021%. While I may not have the academic qualifications in Mathematical physics that my famous comedian namesake has but I know that 95 people (that’s 93% of 102) is slightly less than 93% of the Irish public

Or, to put it another way, significantly and substantially below the statistical margin for error usually applied in political opinion research by professional research companies.

Or to put it another way, over 99% of the population cared so little about the closure of the Vatican Embassy that they couldn’t be bothered expressing an opinion to the Minister.

Of course, the fact is that there were letters written about this issue. And the people who wrote them were expressing their opinion. And 93% of them were against the closure.  In fact, in defending themselves on Twitter against an onslaught of people who spotted the primary school maths level of error in the misuse of statistics in the article, the Irish Examiner twitter account repeatedly states that (and I’m paraphrasing the actual tweets here slightly) “for clarification we did point out that the analysis was based on the letters and emails”. But it is inaccurate and incorrect to conflate the 93% of negative comment in those letters to the entire population as the sample size is not statistically valid or representative being

  1. Too small (for a statistically valid sample of the Irish public you would need between 384 and 666 people selected RANDOMLY, not from a biased population. That’s why RED C and others use sample sizes of around 1000 people at least for phone surveys etc
  2. Inherently biased. 93% of cranky people were very cranky is not a headline. The population set is skewed towards one end of the distribution curve of opinion you would likely find in the wider population.

Then today we see a story in the Examiner about how Lucinda Creighton, a Junior Minister in the Dept of Foreign Affairs is backing a campaign to reopen the embassy because

there’s a very strong, and important and sizeable amount of people who are disappointed with the decision and want to see it overturned and who clearly aren’t happy

What? Like 93% of the Public Lucinda? Where is your data to show the size, strength, and importance of this group? Have you done a study? What was the sample size?

As a benchmark reference for what is needed for an Opinion Poll to validly represent the opinions of the Irish Public, here’s what a reputable polling company says on their website:

For all national population opinion polls RED C interview a random sample of 1,000+ adults aged 18+ by telephone. This sample size is the recognised sample required by polling organisations for ensuring accuracy on political voting intention surveys. The accuracy level is estimated to be approximately plus or minus 3 per cent on any given result at 95% confidence levels.

Anything less than that is not statistically valid data and can’t be held out as representing the opinion of the entire public.

As an Information Quality Certified Professional and an active member of the Information Quality Profession on an International level for nearly a decade I am ethically bound to cry “BULLSHIT!!” on inaccuracies and errors in  information and in how it is presented. The comments from Ms Creighton are a good example of what that is important in the Information Quality and wider Information Management profession. If bullshit analysis or analysis based on flawed or inherently poor quality data is relied upon to make strategic decisions then we invariably wind up with bullshit decisions and flawed actions.

And that effects everything from conversation with family, chats in the pub, business investment decisions, political decision making, through to social policy. Data, Information, and Statistics are COOL and are powerful. They should be treated with respect. People publishing them should take time to understand them so that their readers won’t be mislead. And care should be taken in compiling them so that bias does not skew the results.

So, having had no joy or actual engagement from the Irish Examiner on the issue I forwarded my complaint to the Press Ombudsman yesterday pointing out that the article would seem, based on the disconnect between the headline, the leading paragraph, and the general thrust of it, to be in breach of the Code of Practice of Press Council of Ireland.

I just hope they can tell the difference between lies, damned lies, and fudged statistics. (This Yes Minister clip about Opinion Polls shows how even validly sampled ones can be biased by question format and structure in the survey design).

Turd Polishing

In the course of a twitter conversation with Jim Harris I used the phrase “turd polishing” to describe what happens when organisations try to implement check-box based data governance or Compliance programmes, or invest in business intelligence or analytics strategies without

  • fixing the data which under pins those strategies
  • addressing the organisational cultural and structural issues which have lead to the problem in the first place.
I have witnessed this happening with organisations who, for example, decide that investing in e-learning with a “learning kpi” (x% of staff having reached y% pass mark on an multiple choice exam with a 1 in 4 chance of guessing the right answer) is their approach to evidencing culture change and the embedding of learning.
Of course, this fails miserably when
  • The cultural message is that data job isn’t as important as the Day Job
  • The management practice is to game the system (why take all your staff off the phones to do the learning when you have one person on the team who knows it who can do the exams for everyone with their logins?)
  • Management look only at the easy numbers (the easily gathered test scores at the end of an assessment period).
  • If management seek to rule by fear or quota (“hit these numbers and those numbers or else….”)
If management seek to overlay a veneer of good governance on an unaligned/misaligned  and otherwise outright broken Quality Culture that doesn’t seek to value or maximise the value of their Information are engaging in little more than Turd Polishing. Turd Polishing can be seen in organisations that value Scrap and Rework over re-engineering as a way to address their quality goals. Turd Polishing can be seen in organisations that fudge reports to Regulators or announce “reviews” of issues that everyone has already identified the root causes of around the water coolers and coffee jugs.
No amount of elbow grease and turd polish will change the underlying essence of what is being done. Nothing will improve, but increasing amounts of polish will be required to dress up the turd as a sustainable change programme.
The alternative is to call a turd a turd but work with it to bring out the special properties of manure that can help promote growth and give rise to sweet smelling flowers. That requires spade work and patience to bring about the change of state from turd to engine of growth. But no polishing is required.
In summary – turd polishing gives you a shiny turd that is still a turd. Digging into the manure can lead to you coming up roses.

New Data Protection post over on the company site

I’ve just written a new article over on the company website about Director’s liability for data security breaches. An expert in the Sunday Business Post over the weekend was waving a big stick at Company Directors saying that they could become liable for prosecution for security breaches if Ireland transposes the Convention on Cybercrime into law.

But this expert missed the important points of Section 29 of the Data Protection Acts 1988 and 2003 which create effectively a cascading liability for the  directors, officers, managers, and employees of an organisation that is processing personal data.

Check out my post here:

Bruce Schneier on Privacy

Via the Twitters I came across this absolutely brilliant video of Bruce Schneier talking about data privacy (that’s the American for Data Protection). Bruce makes some great points.

One of the key points that overlaps between Data Protection and Information Quality is where he tells us that

Data is the pollution problem of the Information Age.  It stays around, it has to dealt with and its secondary uses are what concerns us. Just as… … we look back at the the beginning of the previous century and sort of marvel at how the titans of industry in the rush to build the industrial age would ignore pollution, I think… … we will be judged by our grandchildren and great-grandchildren by how well we dealt with data, with individuals and their relationships to their data, in the information society.

This echoes the Peter Drucker comment that I reference constantly in talks and with clients of my company where Drucker said that

So far, for 50 years, the information revolution has centered on data—their collection, storage, transmission, analysis, and presentation. It has centered on the “T” in IT.  The next information revolution asks, what is the MEANING of information, and what is its PURPOSE?

Bruce raises a number of other great points, such as how as a species we haven’t adapted to what is technically possible and the complexity of control is the challenge for the individual, with younger people having to make increasingly complex and informed decisions about their privacy and what data they put where and why (back to meaning and purpose).

I really like his points on the legal economics of Information and Data. In college I really enjoyed my “Economics of Law” courses and I tend to look at legalistic problems through an economic prism (after all, the law is just another balancing mechanism for human conduct). I like them so much I’m going to park my thoughts on them for another post.

But, to return to Bruce’s point that Data is the pollution problem of the Information age, I believe that that statement is horribly true whether we consider data privacy/protection or Information Quality. How much of the crud data that clutters up organisations and sucks resources away from the bottom line is essentially the toxic slag of inefficient and “environmentally unfriendly” processes and business models? How much of that toxic waste is being buried and ignored rather than cleaned up or disposed of with care?

Is Information Quality Management a “Green” industry flying under a different flag?

The Who/What/How and Why

Data protection and Information Quality are linked in a number of ways. At one level, the EU Directive on Data Protection (95/46/EC) describes the underlying fundamental principles of Data Protection as “Principles for Data Quality”.
While that is great pub quiz content, it helps to be able to make some more pragmatic and practical links as well.
On a project a while ago, I was asked to help a client ensure that certain business processes they were putting in place with a partner organisation were data protection compliant. They’d been asked to do this by the partner organisation’s lawyers.
I leaped into action, assuming that this would be an easy few days of billable. After all, all I needed to know was what data the partner organisation needed when and why to document some recommendations for my client on how to build a transparent and compliant set of policies and procedures for data protection.

Unfortunately the partner organisation seemed to lack an understanding of the what’s, why’s, when’s, and how’s of their data. This was perplexing as, nice and all as a blank canvas is, sometimes you need to have a sense of the landscape to draw your conclusions against.
The engagement I had from the partner organisation was focussed on their need to be able to take certain steps if certain circumstances came to pass. While the focus on the goal was commendable, it served to generate tunnel vision on the part of the partner that put a significantly valuable project at risk.
Goals and objectives (why) are all well and good. But Knowledge Workers need to be able to link these to processes (how) and information needs (what). Deming famously said that if you can’t describe what you are doing as a process then you don’t know what you are doing. I’d go further and say that if you can’t identify the data and information you need to do what you are doing then you can’t be doing it- at least not without massively increased costs and risks (particularly of non-compliance with regulations).
In the end I made some assumptions about the what’s and how’s of the partner organisation’s processes in order to meet the goal that they had focussed on so narrowly.
That enabled me to map out an approach to data protection compliance based on a “minimum necessary” principle. And that got my client and their partner over the hump.
But, from an information quality perspective, not being able to answer the why/why/how questions means you can’t set meaningful measures of “fitness for purpose”. If you don’t know what facts are needed you don’t know if information is missing. if you don’t know what use data will be put to you can’t possibly tell if it is accurate enough.

So, both Data Protection and Information Quality require people to know the what/why/how questions about their information to allow any meaningful outcome to ensue. If you can’t answer those questions you simply cannot be doing business.
To paraphrase Deming – we need to work on our processes, not their outcome.

Profound Profiling

Over the past few weeks at a number of events and speaking engagements I’ve found myself talking about the multifaceted benefits of Data Profiling from the perspectives of:

  • Complying with EU Data Protection regulations
  • Ensuring Data Migrations actually succeed
  • Enabling timely reporting of Regulatory risks

My mantra in these contexts seems to be distilling down to two bald statements:

  • It’s the Information, Stupid.
  • Profile early, profile often.

But what do I mean by “Data Profiling”? For the purposes of these conversations, I defined “Data Profiling” as being the analysis of the structure and content of  a data set against some pre-defined business rules and expectations. For example, we may want to know how many (or what percentage) of records in a data set are missing key data, or how many have inconsistencies in the data, or how many potential duplicates  there are in the data.

Why is this of benefit? While a journey of a 1000 miles starts with a single step, that journey must start from somewhere and be headed somewhere. The destination is encapsulated in the expected business rule outcomes and expectations. These outcomes and expectations are often defined by external factors such as Regulatory requirements (e.g. the need to keep information up to date under EU Data Protection principles, or the need to track bank accounts of minors in AML processes) or the strategic objectives of the organisation. The starting point is, therefore, a snapshot of how close you are (or how far you are) from your destination.

In my conversations, I advised people (none of whom were overly familiar with Information Quality principles or tools) that they should consider investing in a tool that allows them to build and edit and maintain Data Profiling rules and run them automatically. Regular Information Quality geeks will probably guess that the next thing I told them was about  how the profile snapshots could provide a very clear dashboard of how things are in the State of Data in their organisations.

Just as, when we are embarking on our journey of 1000 miles, it makes sense for us to regularly check our map against the landmarks to make sure we are heading in the right direction. The alternative is to meander down cul de sacs and dead end trails. Which equates in Information Management terms to wasted investment and scrap and rework. So, profile early and profile often seems to be a good philosophy to live by.

By applying  business rules that relate to your regulatory compliance, risk management, or data migration objectives, you can make Information Quality directly relevant to the goals of the organisation, increasing the likelihood of any changes you bring in becoming “part of the way things get done around here” rather than “yet another darned thing we have to do”.  Quality for the sake of quality was a luxury even in the pre-recession period. In today’s economy it is more important than ever to demonstrate clear value.

And that is the real profoundity of profiling. Without it you can’t actually know the true value of your Information Asset or determine if your current course of action might turn your Asset into a Liability.

It’s the Information, Stupid. So Profile Early and Profile Often.

Sometimes it is the simplest things…

Yesterday I took some time out from work to help hang some new light fittings at home. Our local handyman/neighbour was doing the hard work as my wife has seen enough of my father’s DIY exploits to have put an embargo on me even looking sideways at power tools.

The estimated duration of the job was to be about 45 minutes to an hour to hang three fittings. The first two fittings went up in about 20 minutes. The final one, that took us about 4 hours (and as of this morning still isn’t finished. We hadn’t factored on the “creativity” of the electricians who installed the original wiring.

When we opened up the existing light fitting in the living room we were faced with a spaghetti junction of cables. When we wired them into the new light fitting, the light went on but the switch wasn’t controlling it. It seemed we’d wired the light into a loop going somewhere else. We were faced with 5 live wires which had been going into 4 connectors on a connector block. So we had to then test each of the possible live/neutral combinations in turn to find the ones that actually related to the switch (which necessitated our handyman/neighbour having to play with live 240 volt electricity, which is never a good idea).

When we traced the correct cable pair I did a very simple thing. I dug out my label maker and put a label on the cables that related to the lighting circuit in that room. It struck me that that 30 seconds of effort was something that the electrician who wired the house could have easily done when they were installing the cables, making life simpler for him (or her) and for anyone who came after.

We wired everything up and fitted it up for a quick test before finishing the job. I turned the power back on.

Then there was a loud bang and the power went out.

It turned out that there was a break in the live wire we’d just labelled (the important one for the task at hand) slightly further up the cable from where the label was which had pierced through the insulation and come into contact with the metal mounting plate for the light fitting.

As a result, the magic smoke had escaped from the circuit breaker and the light switch.

What had ensued for my neighbourhood handyman and I was instead frustration as  a task which should have taken a half hour stretching into nearly six hours (over 2 days) and additional expense (to the handyman) in replacing the blown components.

To put it another way, for the want of €0.15 of labelling on the part of the original vendor to identify the attributes of the various wires we found (such as “this one runs the lights”), I expended a full half-day of work and the handyman was unavailable for other jobs which would have paid him a lot more than the rate we’d struck for fitting the lights – and that was before the additional cost and complication of having to go to the electrical wholesalers this morning to buy replacement parts and fit them as well.

It struck me that this is a situation we encounter on a regular basis with the information assets of an organisation.

Very often the important data for a given process in a given area is not clearly identified. Management say “give us everything and we’ll figure it out” and call centre screens and web-forms are cluttered with a variety of information capture points.

A failure to understand (or label) the purpose of that information, where it comes from and where it goes to, and its critical path in the business can result in undesired outcomes as soon as anything starts to change in the business, business processes, or technology platform (such as replacing your front end systems with a new one, the nearest analogy I can think of for changing a light fitting).

This results in expended effort on scrap and rework trying to get the blasted thing to work right with the desired outcomes (such as throwing illumination on a problem), and quite often can result in a critical information path way being blown and needing replacement or an internal control process in the business stopping a process.

Of course, things can often be worse in the Information Quality space where the internal controls on quality may not function as efficiently as a circuit breaker and a light switch which have planned failure built in to them to isolate the end user from the dangers of domestic electricity supply. When controls like circuit breakers fail, the results can be… shocking.

Sometimes it is the simplest things that are important, such as knowing what wires relate to the circuit you are fitting a light into, or what items of information are actually critical to the success or failure of a process (both the immediate process and down stream -remember  there were 4 other live wires relating to other circuits that had to be dealt with as well) is a key contributor to the success or failure of any change effort.

What controls do you have to protect your business knowledge workers from the dangers of a high voltage low quality information? Are the mission critical data in your organisation clearly labelled?