….ye wha’?
There has been a lot written in relation to the electoral register and other matters about using information from other sources to improve the quality of information that you have or to create a new set of information.
This makes sense, other people may already have done much of the work for you and, effectively, all you need to do is to copy their work and edit it to meet your needs. In most cases it may be faster and cheaper to use such ‘surrogates’ for reality to meet your information needs than to go to the effort of going to the real-world things (people, stock-rooms where ever) and actually starting from scratch to build exactly the information you need in the format you require to exactly your standards and formats.
There is, however, a price to pay for having such surrogate sources available to you. You need to accept that
- The format and structure of the information may need to be changed to fit your systems or processes
- The information you are using may itself be innaccurate, incomplete or inconsistent.
- If you are combining it with other information, it will require investment in tools and skills to properly match and consolidate your information into a valid version of the truth.
These risks apply to organisations buying marketing lists to integrate with their CRM systems but also could be applied to students relying on the Internet to present them with the content for their academic projects or journalists trawling for content for newspaper articles or reviews.
Recurrence of common errors, phrases or inaccuracies in term papers is one way that academia has of identifying academic fraud. Similar techniques might be applied in other arenas to identify and track instances of copyright infringement.
In businesses dealing with thousands of records, the cost/risk analysis is relatively straightforward. The recommendation I would make is that clear processes to manage suppliers and to measure the quality of the information they provide you based on a defined standard for completeness, consistency, duplication, conformity etc. is essential. Random sampling of surrogate data sources for accuracy (not every 100th record but a truly random sample) is also strongly recommended.
These are EXACTLY the same techniques that manufacturing industries use to ensure the quality of the raw material inputs to their processes. If it works for industries where low quality can kill (such as pharmaceuticals), why shouldn’t work for you?
For students, journalists and those of us hacking away in the blogosphere the recommendation is simple. Only rely on surrogate sources if you absolutely have to. If you use someone elses work as your source, credit them. If you don’t want to credit them then make sure you verify the accuracy of their work either by actually verifying against reality or by checking with at least one other source.
That way you avoid having the errors of your source become your errors also and you don’t run the risk of someone crying foul and either suing you for stealing their copyright (and copyright does apply to content posted on the internet and in blogs) or taking whatever other sanctions might apply (such as kicking you off your college course).
In many cases the costs and effort involved in double checking (particularly for a once of piece of writing) are neglibily different to the costs of actually starting from scratch and building your information up yourself. And, depending on the context, it may even be more enjoyable.
The New York Times not so long ago had to relearn the lessons of checking stories with at least one other source for accuracy.
Horatio Caine in CSI:Miami always tells his team to “trust, but verify”.
When using surrogate sources for real-world information in any arena you must assess the risk of doing so and put in place the necessary controls so that you can trust that you have verified.
(c) Daragh O Brien 2006 (just in case)