Interactive Demo
Watch the neural network being queried by an attacker. Click Attack to launch a new membership inference round. Hover over data points to see their confidence scores. The attacker determines if data was in the training set.
About this project
This research project explores the vulnerability of machine learning models to membership inference attacks. In a membership inference attack, an adversary queries a trained model with data points and analyzes the model's confidence scores to determine whether those points were part of the original training dataset. This poses a significant privacy risk, especially when models are trained on sensitive data such as medical records or financial information.
The project implements black-box attack strategies where the attacker has no access to the model's internal parameters -- only its prediction outputs. By training shadow models that mimic the target model's behavior and analyzing the statistical differences in prediction confidence between training members and non-members, the attack achieves high accuracy in distinguishing membership. This work highlights the critical need for privacy-preserving techniques like differential privacy in deployed ML systems.