May 21, 2008

The Electoral Register Hokey-Cokey

When I was a small child, my grandmother used to entertain me and my siblings by getting us to sing and dance the hokey cokey, a playful little song and dance routine if ever there was one.

This dance was brought to mind yesterday when Fergal of the Tuppenceworth bloggers emailed me to let me know that he appears to have been taken off the Electoral Register in his home county. Again.

You put your right to self-determination and election of a government by proportional representation as mandated by the constititution of the Irish Republic in.
You put your right to self-determination and election of a government by proportional representation as mandated by the constititution of the Irish Republic out.
In. Out. In. Out.
And you shake it all about.

It would seem that Fergal had been taken off the Register during the Great Clean up of 2006. He then had his ballot reinstated. The other day, in a fit of electoral existentialism he decided to try and find himself on the Electoral Register website www.checktheregister.ie

Zen like, he found himself encountering the concept of nothing as a search for his name at his address revealed nothing. Oh Hokey Cokey Cokey indeed.

So what may have gone wrong here?

  • Is Fergal’s name transposed on the Register (surname first, firstname last)?
  • Is the address registered against Fergal on the Register different to his address?
  • Does the search function on the Electoral Register require an exact character match on names/addresses? Is “Fergal” interpreted as a different name to “Fearghal” (both Fergals in my book)?
  • If Fergal has indeed been deleted from the Register (again), what triggered the Hokey Cokey here? Was an old copy of the Register loaded to the website?
  • Is the version you search on-line up to date with the version you might find in your library or Garda Station? Might Fergal be on the Register, but just not on the Register that is searched? Might it work in the contrary… Might people be listed as ‘on the register’ in an on-line search but be off the Register in the ‘paper’ world (ie the version that counts on polling day)?

The list of potential root causes is (especially as I am speculating a bit) quite long. However this is further evidence that the processes for the management of the Electoral Register are a bit knackered. This has been accepted by the Government and the Oireachtas Committee on the Electoral Register recently published a series of recommendations which eerily echoed comments and recommendations made on this blog over 2 years ago.

However, while there is an urgent need to have as accurate an electoral register as possible (1 Referendum in our immediate future and Local Elections in the not to distant future), care must be taken to ensure we solve the problems of tomorrow as well as the problems of today.

But in the words of Tom Jones - “I think I’m gonna dance now”…

“Oh, hokey cokey cokey…. Oh hokey cokey cokey…..”

May 15, 2008

Telephone numbers and Information Quality - the risk of assumption

There is an old saying that the word “Assume” makes an “Ass” out of “You” and “Me”.

Yet we see (and make) assumptions every day when it comes to assessing the quality (or otherwise) of information. Anglo-Saxon biassed peoples (US, English speaking Europe etc) often assume that names are structured Firstname Surname. “Daragh” = First Name, “O Brien” = Surname. The cultural bias here is well documented by people like Graham Rhind (who advises the use of “Given Name/Family Name” constructs on web forms etc. to improve cross-cultural usability.

But what if you see “George Michael” written down (without the context of labels for each name part) with a reference to “singer”? Would this relate to the pop singer George Michael, or the bass baritone singer Michael George?

One of the common ‘rules of thumb’ with telephone numbers is that, when you are trying to create the full ‘internationalised’ version of a telephone number (+[international access code] [local area code] [local number]) you take the number as written ‘locally’ and drop the leading zero. Of course, like most conventional wisdom a little scrutiny causes this rule of thumb to fall apart.

For example, in the Czech Republic there is no ‘leading zero’ as it is actually part of the international access code (which actually makes more sense to me…). One might assume that Europe, with the standardisation ethos of the European Union would all have plumped for “0″ as a leading digit on local area codes. Not so, as Portugal doesn’t use any leading digit on their area codes. Some countries that used to be part of the USSR (like Russia, Belarus and Azerbijan) use 8 instead of 0.

You might not be safe in assuming that you just need to consider the first digit of the local area code. Hungary has a 2-digit prefix (06), so you would need to parse in 2 characters in the string to remove the correct digits. Just stripping the leading zero will result in a totally embuggered piece of information.

Also, everyone assumes that a telephone number will consist only of numbers. However, there are a few instances where the code required to dial out from a country (the International Direct Dial code) is actually alphanumeric in that it contains either the * (star) or # (hash key/pound key). Our buddies in Belarus are an example of this, where to dial out from Belarus you need to dial “8**10″ (which even more confusingly is often written “8~10″.
So what does this mean for people who are assessing or seeking to improve the quality of telephone number data in their systems?

Well, first off it means you need to have some context to understand the correct business rules to apply. For example, the rules I would apply to assessing the quality (and likely defects) in a telephone number from Ireland would be different to what I’d need to apply to telephone numbers relating to Belarus. In an Irish telephone number it would be correct to strip out instances of “**” and then validate the rest of the string based on its length (if stripping the ** made it too short to be a telephone number then we would need to tag it as duff data and remove it). With data relating to Belarus it might simply be that the person filling in the form (the source of the data) got confused about what codes to use.

Secondly, it means you need to put some thought into the design of information capture processes to reduce the chances of errors occuring. Defining a structure with seperate fields, linking the international access code to a country drop down (and a library of business rules for how to interpret and ’standardize’ subsequent inputs) would not be too difficult - it would just require investment of effort in researching the rules and maintaining them once deployed. Here’s a link to a useful resource I’ve found (note that I can’t vouch for the frequency of updates to this site, but I’ve found it a fun way to figure out what the rules might be for various countries). Also, Wikipedia has a good piece on Telephone number plans. Graham Rhind also has some good links to references for telephone number format rules
Looking at the data of a telephone number in isolation will most likely result in you screwing up some of the data (if you have international telephone number). Having the country information for that data (is the number in France or Belarus) allows you to construct appropriate rules and make your assumptions in the appropriate context to reduce your risks of error.

Ultimately, blundering in with a crude rule of thumb and simply stripping any leading zeros you find because that is the assumption you’ve made will result in you making an ass out of you and your data.

Which raises an interesting question…

Imagine you have been given a spreadsheet of telephone numbers that you have been told are international numbers in the ‘local’ formats for the respective countries. You open the spreadsheet and there are no leading zeros (because Excel -and most other spreadsheets- assumes that numbers don’t begin with zero and strip it out). What to you do to get the data back to a format that you can actually use?

Answers on a post card (or in the comments) please.