Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the security of computing as practiced today, and as envisioned in the future.

Everyone is welcome at our research group meetings (most Fridays at 11am, but join the slack group for announcements). To get announcements, join our Slack Group (any @virginia.edu email address can join themsleves, or email me to request an invitation).


Adversarial Machine Learning

Secure Multi-Party Computation
Obliv-C · MightBeEvil

Recent Posts

Congratulations Dr. Xu!

Congratulations to Weilin Xu for successfully defending his PhD Thesis!

Improving Robustness of Machine Learning Models using Domain Knowledge

Although machine learning techniques have achieved great success in many areas, such as computer vision, natural language processing, and computer security, recent studies have shown that they are not robust under attack. A motivated adversary is often able to craft input samples that force a machine learning model to produce incorrect predictions, even if the target model achieves high accuracy on normal test inputs. This raises great concern when machine learning models are deployed for security-sensitive tasks.

This dissertation aims to improve the robustness of machine learning models by exploiting domain knowledge. While domain knowledge has often been neglected due to the power of automatic representation learning in the deep learning era, we find that domain knowledge goes beyond a given dataset of a task and helps to (1) uncover weaknesses of machine learning models, (2) detect adversarial examples and (3) improve the robustness of machine learning models.

First, we design an evolutionary algorithm-based framework, Genetic Evasion, to find evasive samples. We embed domain knowledge into the mutation operator and the fitness function of the framework and achieve 100% success rate in evading two state-of-the-art PDF malware classifiers. Unlike previous methods, our technique uses genetic programming to directly generate evasive samples in the problem space instead of the feature space, making it a practical attack that breaks the trust of black-box machine learning models in a security application.

Second, we design an ensemble framework, Feature Squeezing, to detect adversarial examples against deep neural network models using simple pre-processing. We employ domain knowledge on signal processing that natural signals are often redundant for many perception tasks. Therefore, we can squeeze the input features to reduce adversaries’ search space while preserving the accuracy on normal inputs. We use various squeezers to pre-process an input example before it is fed into a model. The difference between those predictions is often small for normal inputs due to redundancy, while the difference can be large for adversarial examples. We demonstrate that Feature Squeezing is empirically effective and inexpensive in detecting adversarial examples for image classification tasks generated by many algorithms.

Third, we incorporate simple pre-processing with certifiable robust training and formal verification to train provably-robust models. We formally analyze the impact of pre-processing on adversarial strength and derive novel methods to improve model robustness. Our approach produces accurate models with verified state-of-the-art robustness and advances the state-of-the-art of certifiable robust training methods.

We demonstrate that domain knowledge helps us understand and improve the robustness of machine learning models. Our results have motivated several subsequent works, and we hope this dissertation will be a step towards implementing robust models under attack.

ISMR 2019: Context-aware Monitoring in Robotic Surgery

Samin Yasar presented our paper on Context-award Monitoring in Robotic Surgery at the 2019 International Symposium on Medical Robotics (ISMR) in Atlanta, Georgia.

Robotic-assisted minimally invasive surgery (MIS) has enabled procedures with increased precision and dexterity, but surgical robots are still open loop and require surgeons to work with a tele-operation console providing only limited visual feedback. In this setting, mechanical failures, software faults, or human errors might lead to adverse events resulting in patient complications or fatalities. We argue that impending adverse events could be detected and mitigated by applying context-specific safety constraints on the motions of the robot. We present a context-aware safety monitoring system which segments a surgical task into subtasks using kinematics data and monitors safety constraints specific to each subtask. To test our hypothesis about context specificity of safety constraints, we analyze recorded demonstrations of dry-lab surgical tasks collected from the JIGSAWS database as well as from experiments we conducted on a Raven II surgical robot. Analysis of the trajectory data shows that each subtask of a given surgical procedure has consistent safety constraints across multiple demonstrations by different subjects. Our preliminary results show that violations of these safety constraints lead to unsafe events, and there is often sufficient time between the constraint violation and the safety-critical event to allow for a corrective action.

A Plan to Eradicate Stalkerware

Sam Havron (BSCS 2017) is quoted in an article in Wired on eradicating stalkerware:

The full extent of that stalkerware crackdown will only prove out with time and testing, says Sam Havron, a Cornell researcher who worked on last year’s spyware study. Much more work remains. He notes that domestic abuse victims can also be tracked with dual-use apps often overlooked by antivirus firms, like antitheft software Cerberus. Even innocent tools like Apple’s Find My Friends and Google Maps’ location-sharing features can be abused if they don’t better communicate to users that they may have been secretly configured to share their location. “This is really exciting news,” Havron says of Kaspersky’s stalkerware change. “Hopefully it will spur the rest of the industry to follow suit. But it’s just the very first thing.”

For more details on his technical work, see the paper in Oakland 2018: Rahul Chatterjee, Periwinkle Doerfler, Hadas Orgad, Sam Havron, Jackeline Palmer, Diana Freed, Karen Levy, Nicola Dell, Damon McCoy, Thomas Ristenpart. The Spyware Used in Intimate Partner Violence. IEEE Symposium on Security and Privacy, 2018.

When Relaxations Go Bad: "Differentially-Private" Machine Learning

We have posted a paper by Bargav Jayaraman and myself on When Relaxations Go Bad: “Differentially-Private” Machine Learning (code available at https://github.com/bargavj/EvaluatingDPML).

Differential privacy is becoming a standard notion for performing privacy-preserving machine learning over sensitive data. It provides formal guarantees, in terms of the privacy budget, ε, on how much information about individual training records is leaked by the model.

While the privacy budget is directly correlated to the privacy leakage, the calibration of the privacy budget is not well understood. As a result, many existing works on privacy-preserving machine learning select large values of ϵ in order to get acceptable utility of the model, with little understanding of the concrete impact of such choices on meaningful privacy. Moreover, in scenarios where iterative learning procedures are used which require privacy guarantees for each iteration, relaxed definitions of differential privacy are often used which further tradeoff privacy for better utility.

We evaluated the impacts of these choices on privacy in experiments with logistic regression and neural network models, quantifying the privacy leakage in terms of advantage of the adversary performing inference attacks and by analyzing the number of members at risk for exposure.

Accuracy Loss as Privacy Decreases
(CIFAR-100, neural network model)

Privacy Leakage
(Yeom et al.’s Membership Inference Attack)

Our main findings are that current mechanisms for differential privacy for machine learning rarely offer acceptable utility-privacy tradeoffs: settings that provide limited accuracy loss provide little effective privacy, and settings that provide strong privacy result in useless models.

The table below shows the number of individuals, out of 10,000 members in the training set, exposed by a membership inference attack, given tolerance for false positives of 1% or 5% (and assuming a priori prevalence of 50% members). The key observations is that all the relaxtions provide lower utility (more accuracy loss) than naïve composition for comparable privacy leakage, as measured by the number of actual members exposed in a test dataset. Further, none of the methods provide both acceptable utility and meaningful privacy — at a high level, either nothing is learned from the training data, or some sensitive data is exposed. (See the paper for more details and results.)

 Naïve Composition Advanced Composition Zero Concentrated Rényi
Epsilon Loss 1% 5% Loss 1% 5% Loss 1% 5% Loss 1% 5%
0.1 0.95 0 0 0.95 0 0 0.94 0 0 0.93 0 0
1 0.94 0 0 0.94 0 0 0.92 0 6 0.91 0 94
10 0.94 0 0 0.87 0 1 0.81 0 20 0.80 0 109
100 0.93 0 0 0.61 1 32 0.49 30 281 0.48 11 202
1000 0.59 0 11 0.06 13 359 0.00 28 416 0.07 22 383
0.00 155 2667 No privacy noise added.

Bargav Jayaraman talked about this work at the DC-Area Anonymity, Privacy, and Security Seminar (25 February 2019) at the University of Maryland:

Paper: When Relaxations Go Bad: “Differentially-Private” Machine Learning
Code: https://github.com/bargavj/EvaluatingDPML)

Deep Fools

New Electronics has an article that includes my Deep Learning and Security Workshop talk: Deep fools, 21 January 2019.

A better version of the image Mainuddin Jonas produced that they use (which they screenshot from the talk video) is below: