The Privacy Implications of Weak AI

Introduction

So a few days ago I started writing this entry titled “Societal Implications of Weak AI”. Over the course of the next few days, I found out just how broad of a topic that is. I kept thinking of more topics and subtopics. With weak AI, there’s so much to discuss. Eventually the entry ballooned to an unmanageable 30+ minute read. I couldn’t figure out how to organize all the topics. So I just decided it would be best to split it up into separate, more digestible entries.

I’ve chosen to limit the scope of this entry to weak AI only. I’m purposely omitting AGI because it warrants its own discussion. AGI, or general artificial intelligence, is AI with intelligence equal to or far exceeding human intelligence in every way that matters. Weak AI by contrast only handles narrowly-defined, limited tasks. But make no mistake. Just because it’s limited doesn’t mean it’s not dangerous. This entry is all about how weak AI threatens our privacy and what we can do about it.

Privacy Must Be Protected

The ’nothing to hide’ people don’t understand this, but privacy is important for the healthy development of humans and other animals. Being watched all the time is psychologically hazardous. It’s backed up by science. Without privacy, there’s nowhere to make mistakes without judgment. Letting AI just destroy our privacy in the name of ‘progress’ is not an option.

AI is Already a Privacy Disaster

AI is already destroying our privacy in numerous ways. Just have a look at awful-ai, a git repo tracking scary usages of AI. AI can be used to infer criminality from a picture of a person’s face. It can recreate a person’s face from their voice alone. Everybody already knows about facial recognition which is a privacy disaster. Big retailers use it for tracking. China uses it to surveil Muslims. Any time you see ‘AI’ and ‘privacy’ in the same sentence, it’s always bad news.

AI Will Become a Worse Privacy Disaster

AI is already very bad for privacy and getting worse all the time. The most worrisome thing is we have no idea how good weak AI can get at privacy-invading use cases. The only limit in sight is how much personal information can theoretically be derived from input data. Can AI accurately predict the time frame when someone last had sex based on a 1 minute video of that person? What about how they’ve been feeling for the past week? It’s hard to say what future AI will be able to predict given some data.

You may be publicly sharing information about yourself online now, knowingly or unknowingly, which a future AI Sherlock Holmes (just a metaphor) can use to derive information about you that you don’t want anyone to know. Not only that, but it will be able to derive information about you that you don’t even know. How much information will future AI be able to derive about me from these journal entries? What will it learn about me from my style of writing, what I write about, when I write about it, etc? I don’t know. Just imagine what inferences future AI will be able to derive about someone given all the data from intelligence agencies and big tech. Imagine how that could be weaponized.

Future AI may not be able to explain how it reaches its conclusions to us humans. But that won’t necessarily matter. As long as its conclusions are accurate, it will be dangerous. If it turns out that future AI Sherlock can derive troves of personal information from very little data, we’ll need very strict privacy protections. If it turns out that AI Sherlock can’t derive much information, then maybe we can relax protections a little.

How to Protect Privacy From AI

Preventing Data Collection

No matter how accurate future AI Sherlock is, there are a few things that will probably have to happen to save privacy from AI in the long term:

Government mass surveillance must end.
There must be a law against businesses collecting data on people.
Businesses must delete existing identifiable data about people.
There must be a law against infrastructure for persistent surveillance of the public. (store surveillance cameras, Ring doorbells)
Police use of AI must be community-controlled.
People must use free software. Non-free software often contains surveillance features.
People must stop using services as software substitutes (SaaSS). They’re prone to surveillance.
People must use encrypted, metadata-resistant communications protocols. Preferably mixnets that prevent traffic analysis against global adversaries. See Nym.
There must be a law against public sector jobs using non-free software, SaaSS, and insecure communications protocols.
Workplaces must stop requiring people to use non-free software, SaaSS, and insecure communications protocols.
There must be a law requiring businesses to accept anonymous forms of payment.
There must be a way to perform transactions online privately.
There must be a law against markets for personal data, the same way there are laws against markets for human organs.
Smartphone location tracking must end.
We must educate people about the importance of privacy and create political pressure to protect it.
[more items here…]

If you notice, almost all of the above points are related to preventing data collection and not preventing AI use. AI is just software. To stop people using it would require extremely draconian measures that might undermine privacy anyways. I’m not saying draconian measures protect us from AI will never be justifiable. I’m just saying why resort to that when there are solutions that aren’t draconian and will actually allow us to preserve our rights?

The best way to stop privacy-invading AI is to stop the data collection. AI needs data to make predictions about people. Without data, AI can’t make predictions. We should still allow mass data collection with AI to predict things like the weather. That doesn’t violate anyone’s privacy. The violation happens when there’s collection of personally identifiable data about people, or collection of data which AI can later use to deduce personally identifiable information about people. That is what we have to prevent.

Problem Areas

There is cause for concern about such strong privacy laws though. For instance there are some highly desirable technologies which inherently require persistent surveillance of public areas, something I insisted we must not have.

Self-Driving Cars

How can you have self-driving cars if it’s illegal to conduct persistent surveillance of the public? You can’t. The cars must have external sensors and cameras in order to work. We could just not have them, but self-driving cars will save millions of lives. We don’t want to block technological development that benefits humanity.

For those cases, we need strict, legally enforceable data collection and data protection standards that businesses must adhere to and perhaps audits to ensure the standards are being followed. If your company builds technology which has the hardware capability to conduct persistent surveillance of the public, then there should be guidelines it has to follow:

The technology must be built with free hardware and run free software exclusively.
The technology must not collect more data than necessary to achieve its ends.
The technology must securely delete said data after it’s no longer needed.
The technology must securely encrypt all transmitted data.
[insert more items here…]

Of course the guidelines will be technology-specific and they won’t be perfect. There will still be data leaks and hacks. But we have to collectively agree on certain trade-offs. There are going to be some benefits of AI we just can’t have unless everybody agrees to sacrifice some level of privacy. We’re not going to be able to have self-driving cars and all the benefits they come with unless we allow cars to drive around with cameras and sensors capturing everything going on around them.

Online AI Matchmaking

For another example, imagine an online AI matchmaking service which finds your perfect match. Suppose it’s more successful than other existing matchmaking services, by any metric. Sounds great right? But there’s a catch. The reason it achieves such great results is because it creates huge dossiers on its users to feed into the AI matchmaking algorithm.

You might be thinking “Well if you don’t want your privacy invaded, just don’t sign up.” Ah but it’s not so simple. None of us live in a privacy vacuum. Every time you give up data about yourself, you risk giving up data about others even if you never explicitly offer data about them. As I already discussed, AI can deduce information about other people you’re close to based on things it knows about you. Using privacy-invading services inevitably leaks some data about nonconsenting non-users.

It still makes sense to mitigate the privacy damage caused by AI matchmaking using the same sort of regulations I proposed for self-driving cars. Deciding not to use the service is an individual decision. But on a societal level, we have to decide whether it’s okay for such a service to exist in the first place in an environment where AI Sherlock could use the data to derive personal information about nonconsenting non-users.

Other Technologies

The examples of self-driving cars and AI matchmaking were pretty mild in terms of their privacy invasiveness. As more jobs become automated, privacy trade-offs will happen all over the place. As we’re surrounded by AI-driven robots replacing our jobs, they’ll collect more and more data on us. These AI robots will have to be extremely carefully designed and regulated so that they collect minimal data about people and securely delete it as soon as it’s no longer needed.

If many useful services provided by AI simply cannot exist without collecting personal data on users, then we might end up with a 2-tier society. There will be those who sacrifice their privacy to reap the huge benefits of AI technology. Then there will be those who don’t consent to giving up their privacy who will end up comparatively crippled. Dividing society in this way would be a very bad thing.

Cryptography

But maybe we can avoid making trade-offs. One reason to stay hopeful I haven’t mentioned yet is how cryptography could protect privacy from AI. With advances in homomorphic encryption, differential privacy, zero-knowledge proofs, and other cryptographic tools, we might can have our AI/privacy cake and eat it too. Improvements in homomorphic encryption efficiency in particular could enable us to perform all computations encrypted, including training neural networks on encrypted data. This would be great news for privacy. Since efficient homomorphic encryption would allow businesses to perform arbitrary computations on encrypted data, no business offering an internet service would have any excuse for collecting or storing plaintext user data.

We could also regulate businesses running AI-driven services so they’re legally required to operate it collecting as minimal user data as possible. For instance, if we figured out how to use homomorphic encryption for the hypothetical AI matchmaking business without collecting plaintext data about users, it would then be legally required of all AI matchmaking businesses providing worse or equivalent service to provide that same level of privacy to users.

With that law in place, we could constantly step up privacy protections against AI and also online services that don’t use AI as well. We could also avoid a 2-tier society of those benefiting from AI and those that aren’t. Maybe cryptography can save us from being forced to pick and choose.

Summary

In summary, AI is a danger to privacy. It’s getting more dangerous. To protect our privacy, we need to stop governments and businesses from collecting data about us and get them to purge data they already have. Stronger laws and regulations than currently exist anywhere in the world will need to be passed to protect user privacy in a meaningful way. If we’re fortunate, advances in cryptography, particularly homomorphic encryption, could allow us to reap the benefits of AI without the privacy invasion.

It’s too early to say how the future of privacy will play out. Anyone that claims to know is either full of themselves or lying. There are just too many unknowns. As I said earlier, we don’t know how much predictive power future AI will have or how fast it will develop. We don’t know which privacy laws will be rolled out or when. We don’t know if or when cryptographic tools will become available that can alleviate some of the privacy concerns. We don’t know how public attitudes towards privacy will adapt over time. So it’s all up in the air for now.