Based on some discussions yesterday, I wrote up a more detailed note on the Apple on-device scanning saga with a focus on the "obfuscation" of the exact number of matches and dived into how one might (probabilistically) break it.
This isn't the biggest problem with the proposed system. It does however suggest that even if you *really* trust Apple to not abuse their power (or be abused by power) then Apple still needs to release details about system parameters and assumptions.
We can quibble about the exact numbers I used, and the likelihood of the existence of a "prolific parent account" taking 50 photos a day for an entire year but there are *real* bounds on the kinds of users any static threshold/synthetic parameters can sustain.
And if Apple is generating account-dependent parameters, then the system is even more broken.
i.e. I assert that the actual privacy of this metadata is paradoxically dependent on both Apple *never* deriving certain information AND on them *always* deriving it for every account.
Anyway, now the math I had in my head during Saturday nights stream-of-tweets is out of my head and nicely formatted.
In order to sort through this whole thing Apple should release:
- the threshold of matches required for human review
- the mechanism through which the probability of synthetic matches is derived and whether it is global or per account
If those were public then it would be possible to plug in the numbers and determine exactly what the bounds are for effective obfuscation and we could have an actual conversation about how private the metadata in the system actually is.
You can follow @SarahJamieLewis.
Tip: mention @threader on a Twitter thread with the keyword “compile” to get a link to it.