In his article Dr. Duncan Shaw (U. Nottingham) raises the specter that the recent flood of media stories about leaks, hacks and misuse of personal data is eroding people’s trust in the concept of ‘big data’ to the extent that they may soon rise up in a revolt against the very notion of ‘big data’.
To remedy this situation Dr. Shaw proposes that “big data needs [a] big regulator”. He then proceeds to discuss the difficulty of establishing such a regulator authority, the role it might play in mediating between data consumers, suppliers and the market, and the challenge it would face to “demonstrate its own trustworthiness and independence”.
From the perspective of CaSMa we very much agree that continuing with ‘business as usual’ is not a viable option for organizations that use ‘big data’ which is directly derived from individual human behavior and communication. The problem, when dealing with data about people however is deeper than simply a lack of oversight. It is a lack of a fundamental appreciation for the fact that the data which is being collected, analyzed and ‘mined’ is at its core very private and intimate. The implication of this is that the way it is approach must be different from the way in which data of inanimate objects is handled. Even if the companies collecting the data have only the best of intentions and follow all the rules imposed and monitored by a big regulator, sufficiently large aggregations of personal data can never be guaranteed to be secure against de-anonimization. The risk of hacking always remains and the people, who are individuals not faceless ‘customers’, are fundamentally still confronted with a power imbalance in which they must trust in the good intentions of organizations with no direct transparency about what is happening to their data.
To truly address the core issue of trust that Dr. Shaw referred to, it will be necessary to fundamentally change the way in which ‘big data’ is done (when dealing with human data).
Instead of asking people to hand over the raw data of their lives, which could potentially be analyzed in a vast number of ways, they should keep ownership of the raw data by using personal data containers. Companies, governments and other organizations interested in performing ‘big data’ analysis on this data would then have to submit analysis queries to the people, greatly increasing the transparency of what is being analyzed, and making real ‘informed consent’ feasible. This would go a large way towards balancing the scales of power between the people and the organizations and thereby restoring trust.