Twitter acquired Threader! Learn more

Sarah Jamie Lewis
+ Your AuthorsArchive @SarahJamieLewis Executive Director @OpenPriv. Cryptography and Privacy Researcher. @cwtch_im icyt7rvdsdci42h6si2ibtwucdmjrlcb2ezkecuagtquiiflbkxf2cqd Aug. 13, 2021 3 min read

Apple have given some interviews today where they explicitly state that the threshold t=30.

Which means the false acceptance rate is likely an order of magnitude *more* that I calculated in this article.

Someone asked me on a reddit thread the other day what value t would have to be if NeuralHash had a similar false acceptance rate to other perceptual hashes and I ball parked it at between yeah.

Some quick calculations with the new numbers:

3-4 photos/day: 1 match every 286 days.
50 photos/day: 1 match every 20 days.

Also the fact that they gave a single number for the threshold indicates that they are planning to use a single, global threshold.

Which will result in worse privacy for heavy-use accounts, and will mean the obfuscation can be trivially broken as I explain in the article.

(because if the threshold is constant then Apple *cannot* adjust the rate of the synthetic matches because doing so would mean much more work for the server when running the detection algorithm for accounts with more photos - which is backwards scaling)

So to state this as clear as possible. Given all the technical papers and information about the parametrization that Apple has provided:

1. I believe the obfuscation mechanisms this system claims to provide are fundamentally flawed, and easily broken.

This means for accounts with more photos Apple (et al) will be able to calculate how many actual matches you have - with high probability - even before you cross the threshold.

(See here for details) 

2. I believe that for accounts with more photos, the actual probability of hitting enough false positives to cross the threshold is far greater than the 1 in a trillion number Apple have thrown around (~4% a year for 50 photos/day)

Hopefully Apple will release a breakdown of how they calculated probabilities in their system and what the boundaries are.

But to be very clear, there are hard functional limits around obfuscation given the design of the system they have proposed.

Apples new threat model document contains some actual justification for the numbers! ( )

They are assuming 1/100000 false acceptance rate for NeuralHash which seems incredible low. And assuming that every photo library is larger than the actual largest one.

Some more information about NeuralHash too. They state they did not train it on CSAM images (which makes one wonder what they *did* train it on).

This 100 million number needs some inspection given that there are billions of images exchanged everyday.

In 2017 Whatsapp said they were seeing 4.5 billions photos shared per day.

You can't extrapolate an acceptance false positive rate from 100 million tests.

Some relief that the 1 in a trillion seems to derive from accounts with the heaviest use and not a general average account.

Although given all those numbers (far=1:100M, t=30) it puts the size of the large iphoto library at around 6M images.

So where are we now:

* Mechanism for choosing synthetic voucher probability still unknown
* Obfuscation algorithm still seems broken
* False positives might be lower if you believe Apples experiments were somehow representative of the real world.

Also all the policy issues about client-side scanning and how quickly and easily that can be abused and exploited by Apple (et al) to target all kinds of communities, and the inevitable follow up use to undermine e2ee messages.

Anyway keep up the pressure. The fact that Apple felt it necessary to do a PR blitz today along with releasing new slivers of information regarding parametrization is a good sign.

Also remember "Screeching voices of the minority"

You can follow @SarahJamieLewis.


Tip: mention @threader on a Twitter thread with the keyword “compile” to get a link to it.

Follow Threader