Training machine learning models on privacy-sensitive data has become a
popular practice, driving innovation in ever-expanding fields. This has opened
the door to a series of new attacks, such as Membership Inference Attacks
(MIAs), that exploit vulnerabilities in ML models in order to expose the
privacy of individual training samples. A growing body of literature holds up
Differential Privacy (DP) as an effective defense against such attacks, and
companies like Google and Amazon include this privacy notion in their
machine-learning-as-a-service products. However, little scrutiny has been given
to how underlying correlations or bias within the datasets used for training
these models can impact the privacy guarantees provided by DP. In this work, we
challenge prior findings that suggest DP provides a strong defense against
MIAs. We provide theoretical and experimental evidence for cases where the
theoretical bounds of DP are violated by MIAs using the same attacks described
in prior work. We first show this empirically, with real-world datasets
carefully split to create a distinction between member and non-member samples,
and then we study the reason why the theoretical DP bounds break when members
and non-members are not independent and identically distributed. Our findings
suggest that certain properties of datasets, such as bias or data correlation,
play a critical role in determining the effectiveness of DP as a privacy
preserving mechanism against MIAs.

By admin