fundamentals of information quality

Having spent the day trading IMs and talking to journalists about the Bank of Ireland Laser Card double charging kerfuffle, I thought it would be appropriate to write a calmer piece which spells out a bit more clearly my take on this issue, the particular axe I am grinding, and what this all means. I hope I can explain this in terms that can be clearly understood.

What is LASER?

For the benefit of people reading this who aren’t living and working in Ireland I’ll very quickly explain what LASER card is.

LASER is a debit card system which operates in Ireland. It is in operation in over 70,000 businesses in Ireland. It is operated by Laser Card Services Ltd. Laser Card Services is owned by seven of Ireland’s financial services companies (details here) and three of these offer merchant services to Retailers (AIB, Bank of Ireland, and Ulster Bank). In addition to straightforward payment services, LASER allows card holders to get “cashback” from retailers using their card.

There are currently over 3million Laser Cardholders nationwide, who generated more than â‚¬11.5billion in retail sales in 2008. On average, over 300 Laser card transactions are recorded per minute in Ireland.

How it works (or at least the best stab I can get at it)

As Jennifer Aniston used to say in that advert… “now for the science bit”. Children and persons of a sensitive disposition should look away now.

One problem I’ve encountered here is actually finding any description of the actual process that takes your payment request (when you put your card in the reader and enter your pin) , transfers the money from you to the retailer, and then records that transaction on your bank statement. Â Of course, there are valid security reasons for that.

So, I’ve had to resort to making some educated guesses based on my experience in information management and some of the comments in the statement I received from Bank of Ireland back in June. If I have any of this wrong, I trust that someone more expert than me will provide the necessary corrections.

The card holder presents their card to the retailer and puts it in the card reader. The card reader pulls the necessary account identifier information for the card holder for transmission to the LASER processing system (we’ll call this “Laser Central” to avoid future confusion).
The retailer’s POS (point of sale) system passes the total amount of the transaction, including any Cashback amount and details of the date, time, and retailer, to the Laser card terminal. Â Alternatively, the Retailer manually inputs the amount on the Laser POS terminal.
This amount and the amount of the transaction is transmitted to the Laser payment processing systems.
‘Laser Central’ then notifies the cardholder’s bank which places a “hold” on an amount of funds in the customer’s account. This is similar in concept to the “pre-authorisation” that is put on your credit card when you stay in a hotel.
At a later stage, ‘Laser Central’ transmits a reconciliation of transactions which were actually completd to the Laser payment processing sytem. This reconciliation draws down against the “hold” that has been put on funds in the card holder’s account, which results in the transaction appearing on the card holder’s bank statement.

Point 5 explains why it can sometimes take a few days for transactions to hit your account when you pay with your laser card.

The Problem

The problem that has been reported by Bank of Ireland today and which was picked up on by Simon over at Tuppenceworth.ie in May is that customers are being charged twice Â for transactions. In effect, the “hold” is being called on the double.

Back in May, Bank of Ireland explained this as being (variously):

A problem caused by a software upgrade
A problem caused by retailers not knowing how to use their terminals properly
A combination of these two

The Software Upgrade theory would impact on steps 3,4, and 5 of the “strawman” Laser process I have outlined above. The Retailer error theory would impact on steps 1 and 2 of that process, with potentially a knock on onto step 5 if transactions are not voided correctly when the Retailer makes an error.

But ultimately, the problem is that people are having twice as much money deducted from their accounts, regardless of how it happens in the course of this process. And as one of the banks that owns and operates Laser Card Services, Bank of Ireland has the ability to influence the governance and control of each step in the process.

The Risk of Poor Information Quality

Poor quality information is one of the key problems facing businesses today. A study by The Data Warehousing Institute back in 2002 put the costs to the US economy at over US$600billion. Estimated error rates in databases across all industries and from countries around the world range between 10% and 35%. Certainly, at the dozens of confernces I’ve attended over the years, no-one has ever batted an eyelid when figures like this have been raised. On a few occasions delegates have wondered who the lucky guy was who only had 35% of his data of poor quality.

The emerging Information Quality Management profession world wide is represented by the International Association for Information & Data Quality (IAIDQ).

Information Quality is measured on a number of different attributes Â (some writers call these Dimensions). The most common attributes include:

Completeness (is all the information you need to have in a record there?)
Consistency (do the facts stack up against business rules you might apply- for example, do you have “males” with female honorifics? Do you have multiple transactions being registered against one account within seconds of each other or with the same time stamp?)
Conformity (again, a check against business rules Â – does the data conform to what you would expect. Letters in a field you expect to contain just numbers is a bad thing)
Level of duplication ( simply put… how many of these things do you have two or more of? And is that a problem?)
Accuracy (how well does your data reflect the real-word entity or transaction that it is supposed to represent?)

In models developed by researchers at MIT there are many more dimensions, including “believability”.

In Risk Mangement there are three basic types of control:

Reactive (shit, something has gone wrong… fix it fast)
Detective (we’re looking out for things that could go wrong so we can fix them before they become a problem that has a significant impact)
Preventative (we are checking for things at the point of entry and we are not letting crud through).

Within any information process there is the risk that the process won’t work the way the designers thought/hoped/planned/prayed (delete as appropriate) it would. Â In an ideal world, information would go in one end (for example the fact that you had paid â‚¬50 for a pair of shoes in Clarks on O’Connell Street in Dublin on a given day) and would come out the other end either transformed into a new piece of knowledge through the addition of other facts and contexts (Clarks for example might have you on a Loyalty card scheme that tracks the type of shoes you buy) or simply wind up having the desired outcome… â‚¬50 taken from your account and â‚¬50 given to Clarks for the lovely pair of loafers you are loafing around in. This is what I term the “Happy Path Scenario”.

However lurking in the wings like Edwardian stage villains is the risk that something may occur which results in a detour off that “Happy Path” on to what I have come to call the “Crappy Path”. The precise nature of this risk can depend on a number of factors. For example, in the Clarks example, they may have priced the shoes incorrectly in their store database resulting in the wrong amount being deducted from your account (if you didn’t spot it at the time). Or, where information is manually rekeyed by retailers, you may find yourself walking out of a shop with those shoes for a fraction of what they should have cost if the store clerk missed a zero when keying in the amount (â‚¬50.00 versus â‚¬5.00).

Software upgrades or bugs in the software that moves the bits of facts around the various systems and processes can also conspire to tempt the process from the Happy Path. For example if, in the Laser card process, it was to be found that there was a bug that was simply sending the request for draw down of funds against a “hold” to a bank twice before the process to clear the “hold” was called, then that would explain the double dipping of accounts.

However, software bugs usually (but not always) occur in response to a particular set of real-world operational circumstances. Â Software testing is supposed to bring the software to as close to real-world conditions as possible. At the very least the types of “Happy Path” and “Crappy Path” scenarios that have been identified need to be tested for (but this requires a clear process focus view of how the software should work). Where the test environment doesn’t match the conditions (e.g. types of data) or other attributes (process scenarios) of the “real world” you wind up with a situation akin to what happened to Honda when they entered Formula 1 and spent buckets of cash on a new wind tunnel that didn’t come close to matching actual track conditions.

This would be loosely akin to giving a child a biscuit and then promising them a second it if they tidied their room, but failing to actually check if the room was tidied before giving the biscuit. You are down two bikkies and the kid’s room still looks like a tip.

In this case, there is inconsistency of information. The fact of two “draw downs” against the same “hold” is inconsistent. This is a scenario that software checks ont he bank’s side could potentially check for and flag for review before processing them. I am assuming of course that there is some form of reference for the “hold” that is placed on the customer’s account so that the batch processing knows to clear it when appropriate.

In the case of my horrid analogy, you just need to check within your own thought processes if the posession of two biscuits is consistent with an untidy room. If not, then the second biscuit should be held back.Â This is a detective control. Checking the room and then trying chasing the kid around the houseto get the biscuit back is a reactive control

Another potential risk that might arise is that the retailer may have failed to put a transaction through correctly and then failed to clear it correctly before putting through a second transaction for the same amount. This should, I believe, result in two “holds” for the exact same amount being placed on the customer’s account within seconds of each other. One of these holds would be correct and valid and the process should correctly deduct money and clear that hold. However it may be (and please bear in mind that at this point I am speculating based on experience not necessarily an in-depth insight into how Laser processing works) that the second hold is kept active and, in the absence of a correct clearance, it is processed through.

This is a little more tricky to test for in a reactive or detective controls. It is possible that I liked my shoes so much that I bought a second pair within 20 seconds of the first pair. Not probable, but possible. And with information quality and risk management ultimately you are dealing with probability. Because, as Sherlock Holmes says, when you have eliminated the impossible what remains, no matter how improbable, is the truth.

Where the retailer is creating “shadow transactions” the ideal control is to have the retailer properly trained to ensure consistent and correct processes are followed at all time. However, if we assume that the idea of a person validly submitting more than one transaction in the same shop for the same amount within a few moments of each other is does not conform with what we’d expect to happen then one can construct a business rule that can be checked by software tools to pick out those types of transaction and prevent them going through to the stage of the process that takes money from the cardholder’s account.

Quite how these errors are then handled is another issue however. Some of them (very few I would suggest) would be valid transactions. And this again is where there is a balance between possiblity and probability. It is possible that the transaction is valid, but it is more probable that it is an error. The larger the amount of the transaction, the more likely that it would be an error (although I’ve lost track of how many times I’ve bought a Faberge egg on my Laser card only to crave another nanoseconds later).

Another key area of control of these kinds of risk is, surprisingly, the humble call centre. Far too often organisations look on call centres as being mechanisms to push messages to customers. When a problem might exist, often the best way to assess the level of risk is to monitor what is coming into your call centres. Admittedly it is a reactive control once the problem has hit, but it can be used as a detective control if you monitor for “breaking news”, just as the Twitter community can often swarm around a particular Â hashtag.

The Bank of Ireland situation

The Bank of Ireland situation is one that suggests to me a failure of Information governance and Information risk management at at least some level.

It seems that Call Centre staff were aware in May of a problem with double dipping of transactions. This wasn’t communicated to customers or the media at the time.
There was some confusion in May about what the cause was. It was attributed variously to a software upgrade or retailers not doing their bit properly.
Whatever the issue was in May, it was broken in the media in September as an issue that was only affecting recent transactions.

To me, this suggests that there was a problem with the software in May and a decision was taken to roll back that software change.

Where was the “detective” control of Software Testing in May?
If the software was tested, what “Crappy Path” scenarios were missed from the test pack or test environment that exposed BOI customes (and potentially customers of the other 7 banks who are part of Laser) to this double dipping?
If BOI were confident that it was Retailers not following processes, why did they not design effective preventative controls or automated detective controls to find these types of error and automatically correct them before they became front page news?

Unfortunately, if the Bank’s timeline and version of events are take at face value, the September version of the software didn’t actually fix the bug or implement any form of effective control to prevent customers being overcharged.

What is the scenario that exists that eluded Bank of Ireland staff for 4 months?
If they have identified all the scenarios… was the software adequately tested and is their test enviroment a close enough model of reality that they get “Ferrari” performance on the track rather than “Honda” performance?

However, BOI’s response to this issue would seem to suggest an additional level of contributory cause which is probably more far reaching than a failure to test software or properly understand how the Laser systems are used and abused in “the wild” and ensure adequate controls are in place to manage and mitigate risks.

A very significant piece of information about this entire situation is inconsistent for me. Bank of Ireland has stated that this problem arose over the past weekend and was identified by staff immediately. That sounds like a very robust control framework. However it is inconsistent with the fact that the issue was raised with the Bank in May by at least one customer, who wrote about it in a very popular and prominent Irish blog. At that time I also wrote to the Bank about this issue asking a series of very specific questions (incidentally, they were based on the type of questions I used to ask in my previous job when an issue was brought to our attention in a Compliance context).

I was asked today if Simon’s case was possibly a once off. My response was to the effect that these are automated processes. If it happens once, one must assume that it has happened more than once.

In statistical theory there is a forumla called Poisson’s Rule. Simply put, if you select a record at random from a random sample of your data and you find an error in it then you have a 95% probability that there will be other errors. Prudence would suggest that a larger sample be taken and further study be done before dismissing that error as a “once off”, particularly in automated structured processes. I believe that Simon’s case was simply that random selection falling in my lap and into the lap of the bank.

Ultimately, Â I can only feel now that Simon and I were fobbed off with a bland line. Perhaps it was a holding position while the Bank figured out what was going onand did further analysis and sampling of their data to get a handle on the size of the problem. However, if that was the case I would have expected the news reports to day to have talked about an “intermittent issue which has been occurring since May of this year”, not a convenient and fortuitous “recent days”.

Unfortunately this has the hallmark of a culture which calls on staff to protect the Bank and to deny the existence of a problem until the evidence is categorically staring them in the face. It is precisely this kind of culture which blinkers organisations to the true impact of information quality risks. It is precisely this kind of culture which was apparent from the positions taken by Irish banks (BOI included) in the run up to the Government Bank Guarantee Scheme and which continues to hover in the air as we move to the NAMA bailout.

Tthis kind of culture is an anathema to transparent and reliable managment of quality and risk.

Conclusion

We will probably never know exactly what the real root cause of the Bank of Ireland double dipping fiasco is. The Bank admitted today in the Irish Times that they were not sure what the cause was.

Given that they don’t know what the cause was and there are differences of record as to when this issue first raised its head between the Bank and its own customers, it is clear that there are still further questions to ask and have answered as to the response of Bank of Ireland to this issue. In my view it has been a clear demonstration of “mushroom management” of risk and information quality.

Ultimately, I can only hope that other banks involved in Laser learn from BOI’s handling of this issue which, to my mind, has been poor. What is needed is:

A clear and transparent definition of the process by which a laser transaction goes from your fingers on the PIN number pad to your bank account. This should not be technical but should be simple, business process based, ideally using only lines and boxes to explain the process in lay-person’s terms.
This can then form the basis in Banks and audit functions for defining the “Happy Path” and “Crappy Path” scenarios as well as explaining to all actors involved what the impact of their contribution is to the end result (a customer who can pay their mortgage after having done their shopping for example)
Increased transparency and responsiveness on the part of the banks to reports of customer over charging. Other industries (and I think of telecommunication here) have significant statutory penalties where it is shown that there is systemic overcharging of customers. In Telco the fine is up to â‚¬5000 per incident and a corporate criminal conviction (and a resulting loss in government tendering opportunities). I would suggest that similar levels of penalties should be levied at the banks so that there is more than an “inconvenience cost” of refunds but an “opportunity cost” of screwing up.
A change in culture is needed away towards ensuring the customer is protected from risk rather than the bank. I am perfectly certain that individual managers and staff in the banks in question do their best to protect the customer from risk, but a fundamental change in culture is required to turn those people from heroes in the dark hours to simply good role models of “how we do things here”.

There is a lot to be learned by all from this incident.

Tag: fundamentals of information quality

Bank of Ireland Double Charging – a clarifying post

What is LASER?

How it works (or at least the best stab I can get at it)

The Problem

The Risk of Poor Information Quality

Conclusion