As part of the process for establishing the UN global development agenda after the 2015 millennium development goals (MDGs), an Independent Expert Advisory Group appointed by UN Secretary-General Ban Ki-moon, called for expert recommendations on bringing about a data revolution in sustainable development (the call ran from September 25th to October 15th, 2014). For more information on the UN data revolution proposals, see http://www.undatarevolution.org.
In response to this call the CaSMa team put together the following comments:
The move towards making data publically available plays an important role in making public, and private, organizations more transparent and accountable towards the people they serve (i.e. their citizens or customers). Making data more accessible has also the potential to open up new avenues for feedback that could contribute towards innovative service improvements.
When considering the norms, incentives and regulations for publicly available data it is important to remember that the source of much of this data is people (via national census, tax records, online social networks, individual “smart” devices including phones, travel cards, etc.). As such it is important to guarantee that their fundamental human right to privacy, as stipulated in article 12 of the UN declaration of human rights, is respected. It is important in this respect to bear in mind that rich databases, containing multiple information points about each individual, often make it possible to de-anonymize the data through simple cross-referencing with one or more publicly available information sources, such as a telephone book or street atlas. Simply replacing the name of a person with a randomized identifier cannot be considered sufficient to make the data anonymous. The risks associated with the de-anonymization of data become especially acute for data generated by members of minority groups, since the smaller number of individuals within these groups increases the chance that specific individuals will be represented in multiple data sets.
In light of the above, increased data literacy is necessary not only to allow people to interpret available data about governments and corporations, but also to give people to ability to understand how much they are revealing about themselves when they participate in data transactions, such as sharing information on social networks, or how valuable their data is as a commodity.
On the issue of innovations in data collection and sharing our concerns for human privacy and transparency, we are faced with the need to establish a framework of “ethical by design” methods for data collection. Such methods have the potential to minimize security risks by embedding privacy and explicit consent at the very forefront of the data collection process. In practice, most current data collection is based on a trust based system in which people must rely on the promises of the institution collecting data to respect their privacy and to conduct analysis only for the specific purposes for which permission was granted. This promise is effectively violated if the data is made public and subsequently accessed by a third-party that did not interact directly with the participants from whom data was collected. An example of an “Ethical by Design” data collection process, would be the system proposed by the “Dataware” project (http://www.horizon.ac.uk/Current-Projects/becoming-dataware), where participants retain ownership of the raw data and thus must give explicit consent for each set of analysis that is subsequently performed upon. Within this framework, institutions such as governments collect, and make public, only aggregated statistical data that is no longer linked to any specific individual, while un-aggregated data remains accessible only with the explicit consent of the individuals from whom it is derived.
While keeping in mind the caveats outlined above, we strongly support the ongoing efforts to persuade government, non-profits and corporations to provide open access to data sets in ways that ensure that they are easily usable. Moving forward too fast with open data, however, can lead to policies that are in conflict with existing laws concerning ethics and data privacy. In the UK, for example, the research councils have had to postpone planned requirements for research projects funded by them to unconditionally make their data publically accessible at the end of the project, especially regarding data from human participants since this would violate the ethical terms under which the data was initially collected.
It is important that the ethics of open data is handled correctly at the outset if we are to avoid a public backlash against the open data movement, triggered by third-party misuse of data that was made public with insufficient safeguards. In recognition of the concerns we have raised above, the UK’s Economic and Social Research Council (ESRC) is funding the Citizen centric approaches to Social Media analysis (CaSMa) project, currently underway at the Horizon Digital Economy Research hub (University of Nottingham, UK).