Profound Profiling

Over the past few weeks at a number of events and speaking engagements I’ve found myself talking about the multifaceted benefits of Data Profiling from the perspectives of:

  • Complying with EU Data Protection regulations
  • Ensuring Data Migrations actually succeed
  • Enabling timely reporting of Regulatory risks

My mantra in these contexts seems to be distilling down to two bald statements:

  • It’s the Information, Stupid.
  • Profile early, profile often.

But what do I mean by “Data Profiling”? For the purposes of these conversations, I defined “Data Profiling” as being the analysis of the structure and content of  a data set against some pre-defined business rules and expectations. For example, we may want to know how many (or what percentage) of records in a data set are missing key data, or how many have inconsistencies in the data, or how many potential duplicates  there are in the data.

Why is this of benefit? While a journey of a 1000 miles starts with a single step, that journey must start from somewhere and be headed somewhere. The destination is encapsulated in the expected business rule outcomes and expectations. These outcomes and expectations are often defined by external factors such as Regulatory requirements (e.g. the need to keep information up to date under EU Data Protection principles, or the need to track bank accounts of minors in AML processes) or the strategic objectives of the organisation. The starting point is, therefore, a snapshot of how close you are (or how far you are) from your destination.

In my conversations, I advised people (none of whom were overly familiar with Information Quality principles or tools) that they should consider investing in a tool that allows them to build and edit and maintain Data Profiling rules and run them automatically. Regular Information Quality geeks will probably guess that the next thing I told them was about  how the profile snapshots could provide a very clear dashboard of how things are in the State of Data in their organisations.

Just as, when we are embarking on our journey of 1000 miles, it makes sense for us to regularly check our map against the landmarks to make sure we are heading in the right direction. The alternative is to meander down cul de sacs and dead end trails. Which equates in Information Management terms to wasted investment and scrap and rework. So, profile early and profile often seems to be a good philosophy to live by.

By applying  business rules that relate to your regulatory compliance, risk management, or data migration objectives, you can make Information Quality directly relevant to the goals of the organisation, increasing the likelihood of any changes you bring in becoming “part of the way things get done around here” rather than “yet another darned thing we have to do”.  Quality for the sake of quality was a luxury even in the pre-recession period. In today’s economy it is more important than ever to demonstrate clear value.

And that is the real profoundity of profiling. Without it you can’t actually know the true value of your Information Asset or determine if your current course of action might turn your Asset into a Liability.

It’s the Information, Stupid. So Profile Early and Profile Often.

Comments

5 responses to “Profound Profiling”

  1. […] This post was mentioned on Twitter by Jim Harris, Datamartist. Datamartist said: #DataProfiling @daraghobrien blog post: “Profound Profiling” – http://digs.by/9UBv4b #DataQuality <- Profile early and Often! Yeah! [...]

  2. Jim Harris avatar

    Great post, Daragh!

    To paraphrase The Proclaimers (yes, I know that they’re Scottish, not Irish):

    When I wake up, well I know I’m gonna be,
    I’m gonna be the one who profiles early and often for you
    When I go out, yeah I know I’m gonna be
    I’m gonna be the one who goes along with data
    If I get drunk, well I know I’m gonna be
    I’m gonna be the one who gets drunk on managing risk for you
    And if I haver up, yeah I know I’m gonna be
    I’m gonna be the one who’s havering about how “It’s the Information, Stupid”

    But I would profile 500 records
    And I would profile 500 more
    Just to be the one who profiles a thousand records
    To deliver the profound business benefits of data profiling to your door

    da da da da – ta ta ta ta
    da da da da – ta ta ta ta – data!
    da da da da – ta ta ta ta
    da da da da – ta ta ta ta – data profiling!

    When I’m working, yes I know I’m gonna be
    I’m gonna be the one who’s working hard to ensure compliance for you
    And when the money, comes in for the work I do
    I’ll pass almost every penny on to improving data for you
    When I come home (When I come home), well I know I’m gonna be
    I’m gonna be the one who comes back home with data quality
    And if I grow-old, (When I grow-old) well I know I’m gonna be
    I’m gonna be the one who’s growing old with information quality

    But I would profile 500 records
    And I would profile 500 more
    Just to be the one who profiles a thousand records
    To deliver the profound business benefits of data profiling to your door

    da da da da – ta ta ta ta
    da da da da – ta ta ta ta – data!
    da da da da – ta ta ta ta
    da da da da – ta ta ta ta – data profiling!

    When I’m lonely, well I know I’m gonna be
    I’m gonna be the one who’s lonely without data profiling to do
    And when I’m dreaming, well I know I’m gonna dream
    I’m gonna dream about the time when I’m data profiling for you
    When I go out (When I go out), well I know I’m gonna be
    I’m gonna be the one who goes along with data
    And when I come home (When I come home), yes I know I’m gonna be
    I’m gonna be the one who comes back home with data quality
    I’m gonna be the one who’s coming home with information quality

    But I would profile 500 records
    And I would profile 500 more
    Just to be the one who profiles a thousand records
    To deliver the profound business benefits of data profiling to your door

    da da da da – ta ta ta ta
    da da da da – ta ta ta ta – data!
    da da da da – ta ta ta ta
    da da da da – ta ta ta ta – data profiling!

    🙂

    Best Regards,

    Jim

  3. Ken O'Connor avatar

    Hi Daragh,

    Excellent post. I won’t attempt to write my response in song – well done Jim.

    The age old truism springs to mind “If you don’t measure, you can’t manage”.

    Hence if you don’t measure (profile) your data, you can’t know what it contains, and you can’t begin to improve it’s quality (if that is required).

    Rgds Ken

  4. Vish Agashe avatar

    Daragh,

    I could not agree more with the mantra of profile early profile often. Along with areas which you mention as the benefits of profiling, I would say profiling and benchmarking should serve as the first step in 1000 miles journey of data governance as well.

    Before any organization can start implementing any strategies around data governance, they should know what they are dealing with. Profiling enables discovery phase of governance.

    1. Profile early and profile often (this will help you establish baseline of where organizational data quality and state stands)

    2. Set the goals (could be driven by the regulations or internal standards). This will allow organizations to understand where they want to be in due course of time.

    3. Start the journey of ongoing corrective action to cover the gap between where you are today and where you want to be.
    I believe that profile snapshots should serve as one of the report cards for how effective is data governance framework in a organization.

    Vish Agashe
    http://www.linkedin.com/vishagashe
    twitter: @vishagashe

  5. James Standen avatar

    Hooray for Data profiling- I’ll add my “yop” to that!

    Discovering data quality issues at year end when its time to do the reports is not nearly as good as seeing all year if you are on the right track. By detecting trends, you can track down the root cause of issues as they emerge, not after they’ve already filled your tables.

    Great post. (And I’ll never listen to that proclaimers song quite the same again Jim)