Facebook Research, Timeline Manipulation, & EU Data Protection Law

This is an initial post based on the information I have to hand today (1st July 2014). I’ve written it because I’ve had a number of queries this morning about the Data Protection implications of Facebook’s research activity. I’m writing it here and not on my company’s website because it is a work in progress and is my personal view. I may be wrong on some or all of these questions.

Question 1: Can (or should) the Data Protection Commissioner in Ireland get involved?

Facebook operates worldwide. However, for Facebook users outside the US and Canada, the Data Controller is Facebook Ireland, based in Dublin. Therefore EU Data Protection laws, in the form of the Irish Data Protection Acts 1988 and 2003 applies to the processing of personal data by Facebook. As a result, the Irish Data Protection Commissioner is the relevant regulator for all Facebook users outside the US and Canada. The key question then is whether or not Facebook constrained their research population to data subjects (users) within the US and Canada.

  • If yes, then this is not a matter for investigation by EU data protection authorities (i.e. the Data Protection Commissioner).
  • If no, then the Irish Data Protection Commissioner and EU Data Protection laws come into play.

If Facebook didn’t constrain their population set, it is therefore possible for Facebook users outside of the US and Canada to make a complaint to the DPC about the processing and to have it investigated. However, the DPC does not have to wait for a complaint. Section 10 of the Data Protection Acts empowers the Commissioner to undertake “such investigations as he or she considers appropriate” to ensure compliance with legislation and to “identify any contravention” of the Data Protection Acts 1988 and 2003.

[update] So, it is clear that the data was obtained from a random sample of facebook users. Which raises the question of the sampling method used – was it stratified random sampling (randomised within a sub-set of the total user base) or random sampling across the entire user base? If the former then the data might have been constrained. If the latter, the data inevitably will contain data subjects from outside the US/Canada region. [/update]

Answer: If Facebook hasn’t constrained their population to just North America (US/Canada) then… Yes.

Question 2: If Irish/EU Data Protection Law applies, has Facebook done anything wrong?

Tricky question, and I wouldn’t want to prejudge any possible investigation by the Data Protection Commissioner (assuming the answer to Question 1 would get them involved).  However, based on the information that is available a number of potential issues arise, most of them centred on the question of consent. Consent is a tricky issue in academic research, market research, or clinical research. The study which was conducted related to the psychological state of data subjects. That is categorised as “Sensitive Personal Data” under the Data Protection Acts. As such, the processing of that data requires explicit consent under Section 2B of the Acts. Beyond the scope of the Data Protection Acts, clinical research is governed by ethical standards such as the Nuremburg Code which also requires a focus on voluntary and informed consent:

The voluntary consent of the human subject is absolutely essential… and should have sufficient knowledge and comprehension of the elements of the subject matter involved as to enable him to make an understanding and enlightened decision. This latter element requires that before the acceptance of an affirmative decision by the experimental subject there should be made known to him the nature, duration, and purpose of the experiment

Question 2A: Was Consent Required? Consent is required for processing of sensitive personal data. For that data to be sensitive personal data it needs to be data that is identifiable to an individual and is sensitive in nature. However, if the data being processed was anonymised or pseudonymised then it falls outside the scope of personal data, assuming appropriate controls are in place to prevent re-identification. The Irish Data Protection Commissioner has published guidance in 2007 on Clinical Research in the Healthcare sector which provides some guidance on the question of consent, albeit from the perspective of a pure clinical healthcare perspective. A key point in the guidance is that while anonymising data may remove the Data Protection question around consent, it doesn’t preclude the ethical questions around conducting research using patient data. These kind of questions are the domain of Ethics Committees in Universities or commercial research organisations. Research of this kind are governed by Institutional Review Boards (IRB) (aka Ethics Committees).

Apparently Cornell University took the view that, as their researchers were not actually looking at the original raw data and were basing their analysis of results produced by the Facebook Data Science team they were not conducting human research and as such the question of whether consent was required for the research wasn’t considered. The specifics of the US rules and regulations on research ethics are too detailed for me to go into here. There is a great post on the topic here which concludes that, in a given set of circumstances, it is possible that an IRB might have been able to approve the research as it was conducted given that Facebook manipulates timelines and algorithms all the time. However, the article concludes that some level of information about the research, over and above the blanket “research” term contained in Facebook’s Data Use policy would likely have been required (but not to the level of biasing the study by putting all cards on the table), and it would have been preferable if the subjects had received a debrief from Facebook rather than the entire user population wondering if it was them who had been manipulated. Interestingly, the authors of the paper point to Facebook’s Data Use Policy as the basis of their “informed consent” for this study:

As such, it was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research.

Answer: This is a tricky one. For the analysis of aggregate data no consent is required under DP laws and, it appears, it raises no ethical issues. However, the fact that the researchers felt they needed to clarify that they had consent under Facebook’s Data Use policy to conduct the data gathering experiments suggests that they felt they needed to have consent for the specific experimentation they were undertaking, notwithstanding that they might have been able to clear ethical hurdles over the use of the data once it had been obtained legally.

Question 2b: If consent exists, is it valid? The only problem with the assertion by the researchers that the research was governed by Facebook’s Data Use policy is that, at the time of the study (January 2012) there was no such specified purpose in Facebook’s Data use policy. This has been highlighted by Forbes writer Kashmir Hill.

The text covering research purposes was added in May 2012. It may well have been a proposed change that was working its way through internal reviews within Facebook, but it is impossible for someone to give informed consent for a purpose about which they have not been informed. Therefore, if Facebook are relying on a term in their Data Use Policy which hadn’t been introduced at the time of the study, then there is no valid consent in place, even if we can assume that implied consent would be sufficient for the purposes of conducting psychological research. If we enter into a degree of speculation and assume that, through some wibbly-wobbly timey-wimey construct (or Kashmir Hill having made an unlikely error in her analysis), there was a single word in the Data Use Policy for Facebook that permitted “research”, is that sufficient?

For consent to be valid it must be specific, informed, unambiguous, and freely given. I would argue that “research” is too broad a term and could be interpreted as meaning just internal research about service functionality and operations, particularly in the context in which it appears in the Facebook Data Use Policy where it is lumped in as part of “internal operations”. Is publishing psychological and sociological research part of Facebook’s “internal operations”? Is it part of Facebook’s “internal operations” to try to make their users sad? Interestingly, a review of the Irish Data Protection Commissioner’s Audit of Facebook in 2012 reveals no mention of “Research” as a stated purpose for Facebook to be processing personal data. There is a lot of information about how the Facebook Ireland User Operations team process data such as help-desk queries etc. But there is nothing about conducting psychometric analysis of users through manipulation of their timelines. Perhaps the question was not asked by the DPC?

So, it could be argued by a Data Protection regulator (or an aggrieved research subject) that the consent was insufficiently specific or unambiguous to be valid. And, lest we forget it, processing of data relating to Sensitive personal data such as psychological health, philosophical opinions etc. requires explicit consent under EU law. The direct manipulation of a data subject’s news feed to test if it made them happier or sadder or had no effect might therefore require a higher level of disclosure and a more positive and direct confirmation/affirmation of consent other than “they read the document and used the service”. There are other reasons people would use Facebook other than to be residents of a petri dish.

Does this type of research differ from A/B testing in user interface design or copywriting? Arguably no, as it is a tweak to a thing to see if people respond differently. However A/B testing isn’t looking for a profound correlation over a long term between changes to content and how a person feels. A/B testing is simply asking, at a point in time, whether someone liked presentation A of content versus presentation B. It is more functionally driven market research than psychological or sociological analysis.

Answer: I’d have to come down on the negative here. If consent to the processing of personal data in the manner described was required, it is difficult for me to see how it could be validly given, particularly as the requirement is for EXPLICIT consent. On one hand it appears that the magic words being relied up on by the researchers didn’t exist at the time of the research being conducted. Therefore there can be no consent. Assuming some form of fudged retroactivity of consents given to cover processing in the past, it is still difficult to see how “research” for “internal operations” purposes meets the requirement  of explicit consent necessary for psychological research of this kind. It differs to user experience testing which is more “market research” than psychological and therefore is arguably subject to a higher standard.

Question 3: Could it have been done differently to avoid Data Protection Risks

Short answer: yes. A number of things could have been done differently.

  1. Notification of inclusion in a research study to assess user behaviours, with an option to opt-out, would have provided clarity on consent.
  2. Analysis of anonymised data sets without directed manipulation of specific users timelines would not have raised any DP issues.
  3. Ensure validity of consent. Make sure the text includes references to academic research activities and the potential psychological analysis of user responses to changes in Facebook environment. Such text should be clearly highlighted and, ideally, the consent to that element should be by a positive act to either opt-in (preferred) or to opt-out
  4. Anonymise data sets during study.
  5. Restrict population for study to US/Canada only – removes EU Data Protection issues entirely (but is potentially a cynical move).

Long Answer: It will depend on whether there is any specific finding by a Data Protection Authority against Facebook on this. It does, however, highlight the importance of considering Data Protection compliance concerns as well as ethical issues when designing studies, particularly in the context of Big Data. There have been comparisons between this kind of study and other sociological research such as researchers walking up to random test subjects and asking them to make a decision subject to a particular test condition. Such comparisons have merit, but only if we break them down to assess what is happening. With those studies there is a test subject who is anonymous, about whom data is recorded for research purposes, often in response to a manipulated stimulus to create a test condition. The volume of test subjects will be low. The potential impact will be low. And the opportunity to decline to participate exists (the test subject can walk on by… as I often did when faced with undergrad psychology students in University) With “Big Data” research, the subject is not anonymous, even if they can be anonymised. The volume of test subjects is high. Significantly (particularly in this case) there is no opportunity to decline to participate. By being a participant in the petri-dish system you are part of the experiment without your knowledge. I could choose to go to the University coffee shop without choosing to be surveyed and prodded by trainee brain monkeys. I appear to have no such choice with Data Scientists. The longer answer is that a proper consideration of the ethics and legal positioning of this kind of research is important.