Better than nothing privacy with bloom filters: To what extent?

Abstract

Bloom filters are probabilistic data structures which permit to conveniently represent set membership. Their performance/memory efficiency makes them appealing in a huge variety of scenarios. Their probabilistic operation, along with the implicit data representation, yields some ambiguity on the actual data stored, which, in scenarios where cryptographic protection is unviable or unpractical, may be somewhat considered as a better than nothing privacy asset. Oddly enough, even if frequently mentioned, to the best of our knowledge the (soft) privacy properties of Bloom filters have never been explicitly quantified. This work aims to fill this gap. Starting from the adaptation of probabilistic anonymity metrics to the Bloom filter setting, we derive exact and (tightly) approximate formulae which permit to readily relate privacy properties with filter (and universe set) parameters. Using such relations, we quantitatively investigate the emerging privacy/utility trade-offs. We finally preliminary assess the advantages that a tailored insertion of a few extra (covert) bits achieves over the commonly employed strategy of increasing ambiguity via addition of random bits. © 2012 Springer-Verlag Berlin Heidelberg.

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)