Implementing Privacy Attacks using Machine Learning

Interactive Demo

Watch the neural network being queried by an attacker. Click Attack to launch a new membership inference round. Hover over data points to see their confidence scores. The attacker determines if data was in the training set.

Attack Accuracy: 0% Queries: 0 Members Found: 0

About this project

This research project explores the vulnerability of machine learning models to membership inference attacks. In a membership inference attack, an adversary queries a trained model with data points and analyzes the model's confidence scores to determine whether those points were part of the original training dataset. This poses a significant privacy risk, especially when models are trained on sensitive data such as medical records or financial information.

The project implements black-box attack strategies where the attacker has no access to the model's internal parameters -- only its prediction outputs. By training shadow models that mimic the target model's behavior and analyzing the statistical differences in prediction confidence between training members and non-members, the attack achieves high accuracy in distinguishing membership. This work highlights the critical need for privacy-preserving techniques like differential privacy in deployed ML systems.

machine learning differential privacy Python membership inference black-box attacks privacy