Do Neural Networks Ever Forget? đ§
How machine learning throws a wrench in the 'right to be forgotten.' Bringing in some of the latest computational research on privacy, this post examines how the principles of GDPR collide with the realities of neural networks.
Politics
Jun. 4, 2020
](https://cdn-images-1.medium.com/max/2000/1*pmSFu6KWJGdl3ZJcA_6Q8A.png)
As the usage of data evolves, so should its regulation. Faster and faster, the digital world is embedding itself in our lives to remove friction. Tech removes friction by learning about us and how we behave as a collective, anticipating and reacting accordingly. Think Starbucks sending you a push notification whenever you come close to one of their stores â one ad for a latte if itâs cold out, one for an iced coffee if itâs hot. This has made firms like Facebook, Amazon, Apple, Netflix, and Google some of the most valuable (the most valuable, bar none, if you consider how few employees they have) in history, giving them an out-sized influence on our lives. So it is important to ask: who are these firms accountable to? Or more importantly, what are the market forces that affect how we, their users, are treated? Facebookâs misuse of data with Cambridge Analytica, and Googleâs rogue engineer who adapted a fleet of Street View cars to siphon often-sensitive data from private WiFi networks, have lead to reasonable concerns surrounding how much regulation is needed in tech. Unfortunately, when it comes to protecting our data, privacy legislation fails to take into account artificial intelligence (AI). Instead legislation, like the EUâs General Data Protection Regulation (GDPR), focuses on the explicit collection and transfer of personal information. This ignores what makes data useful to tech firms, how it can be generalized and modeled to commodify everyday behaviour. In this way, machine learning (ML) undermines traditional privacy legislation in twice over: it complicates an our right to access and appeal how organizations use our personal information, and it ignores how ML makes implicit use of personal data.
This argument is a little more nuanced than pointing out the consequences of a world where training data can be reverse engineered, though, this is also a concern. Instead, I want to focus on what privacy legislature attempts to protect: our ability to know how companies use our data, and our ability to maintain control of our data. In doing so weâll see that ML makes it harder to interrogate how companies use our data. Weâll also see that correcting how our data is used in these systems is much harder than correcting the data that is protected in GDPR. Finally, Iâll make the argument that if our aim is to give greater control over how our data is used, then the right to be forgotten must also apply to ML. Otherwise, we will be ignoring what Shoshana Zuboff calls techâs new âlogic of accumulation.â² If you arenât a fan of Zuboff, or the term âlogic of accumulationâ is foreign or off putting, hold onâ thatâs where weâll start. To cap things off Iâll put a spotlight on some of the latest research that aims to address these problems.
What Makes Our Data Useful?
Before we can understand how GDPR fails to protect the use of our data, we need a better understanding of what the connection is between tech firms and our personal privacy. The rapid rise of connectivity and proliferation of uses of the internet has brought about what Shoshana Zuboff considers a new technological logic of accumulation, where big data âorganizes perception and shapes the expression of technological affordances at their roots.â² This is an overly academic way of saying that big data has changed how we view the world â and as a result the way firms, like Google, operate is fundamentally different from non-data oriented firms. Private organizations are able to gain a deep knowledge of our online interactions âfrom aboveâ, anonymously monitoring everyday behaviour to model and exploit whatever information they can gleam². Through continuous data mining and analysis, the Googleâs and Facebookâs of our world are able to understand how we behave at a tremendously granular levelâ¸. The digital bread crumbs we leave behind are collected, stored and then aggregated and modeled to better target, personalize, and enforce. This is what researchers refer to as âthe commodification of everyday behaviour.â² Tech act as indifferent observers who spread their âfreeâ products as widely as possible, to model our behaviour for the benefit of advertisers, insurers, etc. This digital-/data-first process has produced relatively small firms, with fewer fixed costs, that generate tremendous amounts of wealth. And thanks to the unique corporate structures of Facebook and Google, the ability to leverage those assets are often directed by one or two people.
The onus lies on policy makers to ensure that technologyâs advances are brought about in an equitable way. A holistic privacy policy is necessary; legislation must allow for the fair, transparent collection of data, as well as ensure that data are processed and utilized in an equitable way. While tech firms require near ubiquitous monitoring to produce the lakes of data it feeds off of, their true value come from the ability to process and make data useful. Given the enormity of data, this is only made possible through ML. Theoretical and practical advances in ML let tech firms search, sort, cluster and make decisions based off subterranean patterns in data. Therefore, collection and utilization are inextricably linked. However, this is not how our privacy legislation has viewed data collection. Instead, policymakers have generally focused on the former, without recognizing how data collected is exploited implicitly in its utilization.
What GDPR Does (and Doesnât) Do
Private firms â leveraging largely public datasets -â fundamentally altered the United Kingdomâs referendum to leave the EU, and the 2016 US election of Donald TrumpÂł. In my opinion these major events are what brought questions about how our data is used into the public consciousness. Between the weekâs starting April 10, 2016 and April 10, 2019, Google search interest saw increases of 119%â´, 1,566%âľ and 81%âś for the search terms: data privacy, AI ethics, and privacy software respectively. In this same timeframe, Google search interest surrounding artificial intelligence and machine learning also saw a steep uptick with corresponding 43% and 200% increasesâˇ. So it is not surprising that the largest piece of privacy legislation born in this political landscape, the EUâs 2016 GDPR, has been the subject of popular debate and scrutiny. GDPR regulates the processing and free movement of data, and affords individuals the âprotection of [their] personal dataâ through three core sections: the right to informed consent, the right to access personal data, and the right to rectification and erasureâ¸. GDPR gives increased protections to individuals, letting you appeal decisions made by autonomous systems. Unfortunately, by focusing on the explicit collection and movement of data it falls prey to similar flaws in earlier pieces of privacy legislature like Canadaâs PIPEDAâš. These flaws, which weâll go into depth on, are that it can be very hard to: access our data after itâs been processed to train a neural net as well as appeal its uses given the often opaque nature of ML. As well, the right to rectification and erasure fails to take into account ML is structured by the data itâs trained on. This allows companies to profit off of our data long after weâve requested it to be erased.
How do you fight an algorithm?
GDPR (Article 16 specificallyđ) gives us the right to appeal inaccurate collection or use of our personal data. But while the âpurposes of processingâ must be taken into account when rectifying inaccurate uses of data, GDPR fails to establish a litmus test for what constitutes inaccurate usageâ¸. Knowing what needs to be fixed is much easier when the data in question relates to some concrete characteristic. Itâs easy to fix someones name or birthday in a database. However, in cases where an ML system makes some decision about us â like inferring our political orientation, sexuality, or risk of recidivismâ how can you appeal to a neural network? This is key because the data that ML is trained on are produced in an unjust world, and there is often little reason to believe that such models will do anything but replicate preexisting inequalitiesšâ°. It is what researchers often refer to as algorithmic bias (which is different from statistical bias). This was the case when researchers from Microsoft and Boston University demonstrated that word embeddings can exhibit gender stereotypes to disturbing extentsšâ°. However since even supervised ML is left to its own devices to figure out how to best approximate some regression/classification function, it is less straightforward to argue that you have been discriminated againstšš. By placing the burden on individuals to meet this vague standard for what inaccurate usage may mean, GDPR ignores the structural biases that are easily replicated and amplified in MLšš.
Do Neural Networks Ever Forget?
GDPR deviates from previous attempts at privacy legislature by giving individuals the âright to be forgotten.â This means that if you make a request to a company that has possession of your data, they are obligated to erase it. However, this doesnât extend to the ML which your data has been trained on. This is because GDPR fundamentally views data as an input to a machine that makes some decision, when actually, data shapes the decision making system itself. To me, by allowing companies to continuously profit off of our data regardless of individual preferences, this represents a fundamental flaw that ignores techâs logic of accumulation.
GDPR (Articles 17 through 20 nowđ) doesnât recognize that if your data has been used to train a neural network, you are forever imprinted on itš². Even if you submit an erasure request, and your information doesnât appear in any of Facebookâs databases, your information is still implicitly being processed when Facebook decides what ad to show someone. This is what brings us back to our header image. Researchers at Cornell, UCL and the Alan Turing Institute recently demonstrated that collaborative learning models can âleak unintended information about participantsâ training data,â allowing malign actors to âinfer the presence of exact data pointsâfor example, specific locations [⌠as well as] properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture.âš This, hopefully, drives home the fact that ML is not separate from us, and there is a growing body of literature that argues our data shapes the fundamental structure of these models. In some cases, this literally means adding/dropping nodes from the layers of an ANNš³. In framing erasure in such concrete terms, GDPR fails to remedy techâs more exploitative characteristics and refuses to acknowledge the true utility of data: that it ârecords, modifies, and commodifies everyday experience.â²
Improving the âright to be forgottenâ
GDPR gives us the right to challenge companies when they use ML to make decisions about us (what price to give, whether to insure, risk of recidivism). This is a huge step forward. Unfortunately, MLâs quality is to disappear into the background, embedding itself in our digital world. That is to say, there is rarely a big sign saying: âWatch out! A neural network is deciding whether youâre too risky to insure!â Given the embedded nature of ML, its implementation can subtly shape the online world in ways that, while technically consensual, individuals are not fully aware of. This then puts the onus on individuals to parse out their online world for inaccurate or biased systems in ways that could be far from feasible. It also ensures that only those who have the means to educate themselves on how tech/ML operates will be able to have full control over their data. Over the past sixteen years, there has been surprisingly little to fully address the issue of clear and informed consent.
The most overlooked aspect of privacy legislature is that there are no protections to let individuals remove themselves from models that infer from user datašâ´. GDPR does not address the consequences of allowing tech to profit off of models, trained on our data, after weâve invoked our âright to be forgotten.â This requires a conceptual shift in how privacy is viewed. The core of tech firms are their ability to cheaply capture data, the raw material, and model it to achieve various ends. Privacy legislation cannot stop at the collection of data and then interpret the neural net from which it was built as something wholly different. Privacy legislation should extend to ML as well. As of now this problem is only addressed superficially in GDPR. Thankfully, researchers at the University of Cambridge and Queen Mary University of London, among others, are proposing technical solutions to these problems. Shintre et al. proposed a novel solution that allows individual data points to be removed from artificial neural networks in their 2019 paper, Making Machine Learning Forgetšâľ. What this demonstrates is that there are few technical obstacles to fully realizing systems where we can truly have the right to be forgotten. First however, there must be an understanding of how our data can be used and misused, and the political will to hold tech accountable.
Moving Forward
Technology and ML have undoubtedly made our lives better. However, that doesnât mean we shouldnât be critical when tech firms unnecessarily impinge on our rights. Bezos would still be rich if we addressed these issues. In the past 20 years, the tech industry has accumulated massive amounts of user data which legislatures have subsequently had to grapple with. ML undermines the existing forms of privacy legislature in two ways. It subverts the grounds from which we can appeal inaccurate or biased uses of our personal information. As well, problems arise when users are afforded the right to erasure without acknowledging the embedded nature of data in ML. As a result, We need a conceptual shift in how we view privacy.
Our data isnât useful by itself. Given that fact, we need to focus less on the explicit collection and transfer of data, and instead focus more on how our data is used. Our data leaves fingerprints on the neural networks theyâre trained on. Itâs important to remember that those fingerprints are ours as well, and as a result the right to be forgotten should extend to ML.
[1]: Melis, Luca, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. âExploiting unintended feature leakage in collaborative learning.â In 2019 IEEE Symposium on Security and Privacy (SP), pp. 691â706. IEEE, 2019.
[2]: Zuboff, Shoshana. âBig other: surveillance capitalism and the prospects of an information civilization.â Journal of Information Technology 30, no. 1 (2015): 75â89.
[3]: Isaak, Jim, and Mina J. Hanna. âUser data privacy: Facebook, Cambridge Analytica, and privacy protection.â Computer 51, no. 8 (2018): 57.
[4]: Google Trends, âData Privacy Search Interest (2016â2019).â Accessed on April 10, 2020. https://trends.google.com/trends/explore?date=2016-04-10%202020-04-10&q=Data%20Privacy.
[5]: Google Trends, âAI Ethics Search Interest (2016â2019).â Accessed on April 10, 2020. https://trends.google.com/trends/explore?date=2016-04-10%202019-04-10&q=AI%20Ethics.
[6]: Google Trends, âPrivacy Software Search Interest (2016â2019).â Accessed on April 10, 2020. https://trends.google.com/trends/explore?date=2016-04-10%202019-04-10&q=Privacy%20Software.
[7]: Google Trends, âAI and ML Search Interest (2016â2019).â Accessed on April 10, 2020. https://trends.google.com/trends/explore?date=2016-04-10%202019-04-10&q=Machine%20Learning,%2Fm%2F0mkz.
[8]: General Data Protection Regulation, European Parliament2016, 1â77.
[9]: Personal Information Protection and Electronic Documents Act, Revised Statutes of Canada 2000, 4â39.
[10]: Bolukbasi, Tolga, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. âMan is to computer programmer as woman is to homemaker? debiasing word embeddings.â In Advances in neural information processing systems, pp. 4350. 2016.
[11]: Waldman, Ari Ezra. âPower, Process, and Automated Decision-Making.â Fordham L. Rev. 88 (2019): 613.
[12]: Kamarinou, Dimitra, Christopher Millard, and Jatinder Singh. âMachine Learning with Personal Data: Profiling, Decisions and the EU General Data Protection Regulation.â Journal of Machine Learning Research(2017): 1â7.
[13]: Golea, Mostefa, and Mario Marchand. âA growth algorithm for neural network decision trees.â EPL (Europhysics Letters) 12, no. 3 (1990): 205.
[14]: Kamarinou, Dimitra, Christopher Millard, and Jatinder Singh. âMachine Learning with Personal Data: Profiling, Decisions and the EU General Data Protection Regulation.â Journal of Machine Learning Research(2017): 1â7.
[15]: Shintre, Saurabh, Kevin A. Roundy, and Jasjeet Dhaliwal. âMaking Machine Learning Forget.â In Annual Privacy Forum, pp. 72â83. Springer, Cham, 2019.