Thread by @TedOnPrivacy, This makes me want to do a small thread on attacker models [...]

This makes me want to do a small thread on attacker models to consider when building differential privacy tooling.

Putting DP systems vulnerabilities like "timing side-channels" and "floating-point leakage" in the same bucket is natural, but I think it's a bit misguided. https://twitter.com/opendp_io/status/1288885032271085568

https://twitter.com/opendp_io/status/1288885032271085568

So you're building a DP engine. Who's going to interact with it?

There are three kind of users to consider:
1. Trusted users
2. Kinda-but-not-really trusted users
3. Fully untrusted users

tl;dr: 1 is very difficult enough, 2 is even more tricky, and nobody knows how to do 3.

In situation 1, you have someone with direct access to the data, who wants to produce anonymized stats.

Your goal is to make it easy for them to do the right thing, and hard for them to shoot themselves in the foot.

It's already difficult, for the same reason why making crypto libs is difficult.

You have to design crystal-clear interfaces, think long and hard about what options should be available and what their defaults should be, test your code thoroughly, audit it for privacy bugs, etc.

One example of such privacy bugs are floating-point vulnerabilities. If you're not careful, the way you're adding noise is subtly broken, and the stats that a well-intentioned user published leak more information than you thought. https://www.microsoft.com/en-us/research/publication/on-significance-of-the-least-significant-bits-for-differential-privacy/

On Significance of the Least Significant Bits For Differential Privacy - Microsoft Research

We describe a new type of vulnerability present in many implementations of differentially private mechanisms. In particular, all four publicly available general purpose systems for differentially...

https://www.microsoft.com/en-us/research/publication/on-significance-of-the-least-significant-bits-for-differential-privacy/

In situation 2, the user is e.g. an analyst working for your company. They access a database via a DP engine. If they were very malicious, they might have other ways of trying to access the data.

Your job is to make it *unreasonably hard/risky* for them to do creepy things.

The goal here is less "make it provably impossible" and more "make it more difficult/expensive than all other options".

It's harder than situation 1: you have to think about privacy budget tracking, query logging & auditing, and simple attacks.

I would argue that "timing attacks" are still not a real concern at this stage. There are much simpler attack vectors: for example, what do you do about queries that return an error?

Silently hiding errors is 1) hard, and 2) a *huge* usability burden. You can't debug anymore!

If your users are "kinda" trusted, it might be a reasonable choice to allow errors to be surfaced to users, and mitigate that attack vector with monitoring / alerting.

Sampling is also a simple mitigation against most attacks. Not bulletproof, but remember: that's not the goal.

In situation 3, you're giving access to your data via a DP engine to someone whom you distrust completely (say, an outsider). You need your infrastructure to be bulletproof.

Do you care about timing attacks then? Maybe. But you'll bump into other serious problems first.

How do you track privacy budget if you want to give access to multiple people? If you really don't trust them, you should assume they'll collide, so consider all accesses at once.

How do you make sure they aren't finding a new way of attacking the engine? You need auditing too.

Oh, and the problem about queries throwing errors is still there.

Exercise for the reader: count all the ways a SQL query might fail because of the data of a single user.

*All of them* must be *completely invisible* to a user. Fully indistinguishable from a successful query

Assuming that's compatible with the usability requirements of your system… How do you even implement that?

Crafting a query that throws a specific type of error depending on some sensitive information is many times easier than timing attacks.

I don't think anyone knows how to do 3 yet. So spending resources mitigating timing attacks feels very premature.

I hope we get to this stage at some point, but I think focusing on the "easier" scenarios (which are already very hard!) is the right first move.

Latest Threads Unrolled: