Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the security of computing as practiced today, and as envisioned in the future.

Everyone is welcome at our research group meetings (most Fridays at 11am, but join the slack group for announcements). To get announcements, join our Slack Group (any @virginia.edu email address can join themsleves, or email me to request an invitation).

Projects

Adversarial Machine Learning
EvadeML

Secure Multi-Party Computation
Obliv-C · MightBeEvil

Recent Posts

NeurIPS 2019: Empirically Measuring Concentration

Xiao Zhang will present our work (with Saeed Mahloujifar and Mohamood Mahmoody) as a spotlight at NeurIPS 2019, Vancouver, 10 December 2019.

Recent theoretical results, starting with Gilmer et al.’s Adversarial Spheres (2018), show that if inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable.c The key insight from this line of research is that concentration of measure gives lower bound on adversarial risk for a large collection of classifiers (e.g. imperfect classifiers with risk at least $\alpha$), which further implies the impossibility results for robust learning against adversarial examples.

However, it is not clear whether these theoretical results apply to actual distributions such as images. This work presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. More specifically, we prove that by simultaneously increasing the sample size and a complexity parameter of the selected collection of subsets $\mathcal{G}$, the concentration of the empirical measure based on samples converges to the actual concentration asymptotically.

To solve the empirical concentration problem, we propose heuristic algorithms to find error regions with small expansion under both $\ell_\infty$ and $\ell_2$ metrics.

For instance, our algorithm for $\ell_\infty$ starts by sorting the dataset based on the empirical density estimated using k-nearest neighbor, and then obtains $T$ rectangular data clusters by performing k-means clustering on the top-$q$ densest images. After expanding each of the rectangles by $\epsilon$, the error region $\mathcal{E}$ is then specified as the complement of the expanded rectangles (the reddish region in the following figure). Finally, we search for the best error region by tuning the number of rectangles $T$ and the initial coverage percentile $q$.

Based on the proposed algorithm, we empirically measure the concentration for image benchmarks, such as MNIST and CIFAR-10. Compared with state-of-the-art robustly trained models, our estimated bound shows that, for most settings, there exists a large gap between the robust error achieved by the best current models and the theoretical limits implied by concentration.


This suggests the concentration of measure is not the only reason behind the vulnerability of existing classifiers to adversarial perturbations. Thus, either there is room for improving the robustness of image classifiers or a need for deeper understanding of the reasons for the gap between intrinsic robustness and the actual robustness achieved by robust models.

Paper

Saeed Mahloujifar, Xiao Zhang, Mohamood Mahmoody and David Evans. Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness. In NeurIPS 2019 (spotlight presentation). Vancouver, December 2019. [PDF] [arXiv]

Code

https://github.com/xiaozhanguva/Measure-Concentration


Research Symposium Posters

Five students from our group presented posters at the department’s Fall Research Symposium:


Anshuman Suri’s Overview Talk


Bargav Jayaraman, Evaluating Differentially Private Machine Learning In Practice [Poster]
[Paper (USENIX Security 2019)]




Hannah Chen [Poster]




Xiao Zhang [Poster]
[
Paper (NeurIPS 2019)]




Mainudding Jonas [Poster]




Fnu Suya [Poster]
[
Paper (USENIX Security 2020)]


Cantor's (No Longer) Lost Proof

In preparing to cover Cantor’s proof of different infinite set cardinalities (one of my all-time favorite topics!) in our theory of computation course, I found various conflicting accounts of what Cantor originally proved. So, I figured it would be easy to search the web to find the original proof.

Shockingly, at least as far as I could find1, it didn’t exist on the web! The closest I could find was in Google Books the 1892 volume of the Jähresbericht Deutsche Mathematiker-Vereinigung (which many of the references pointed to), but in fact not the first value of that journal which contains the actual proof.

Normally, of course, when something doesn’t turn up in DuckDuckGo searches, that means it doesn’t exist, but for a document this old, I figured it was worth actually visiting a library. (Okay, nothing quite so radical as going to a physical library! By visit, I mean, going to the website for the university library and searching there.)

So, I tried submitting the form our library has, requesting Uber eine elementare Frage der Mannigfaltigkeits-lehre by G. Cantor from the 1891 journal. I didn’t notice the scan request until after submitting, so I tried again, checking the box to request a PDF scan.

I was delighted a few days later to receive this email:

And, indeed the link went to a scan of Cantor’s original proof (PDF):

The really cool thing is about two days later I happened to wander into the printer room, and saw a strange object in my mailbox with a nice musty smell.

Apparently, the original request I’d submitted to the library had gone through, and I found myself starting at an 1891 edition of a German math journal!

And on pages 75-78, Cantor’s original published proof!

I don’t read German, but the last line is well worth translating:

From now on, whenever its hard to come up with a good conclusion to a paper, this one always works.

I believe our library’s policy is that (at least for faculty) when you check out a book you can keep it until the next person requests it. So, I’ll be holding on to this one until then. When prospective high school students visit UVA, they often ask to see all the cool cutting edge technology we use in our research. I’ll be happy to show them three of the coolest things I have in my office: an abacus, an Apple II, and now, an 1891 math journal.


  1. Apparently I wasn’t very good at searching then. In writing this post, I tried a new search and found a great post with both the original German and an English translation: Cantor’s Original 1891 Diagonal proof by James Meyer.

FOSAD Trustworthy Machine Learning Mini-Course

I taught a mini-course on Trustworthy Machine Learning at the 19th International School on Foundations of Security Analysis and Design in Bertinoro, Italy.

Slides from my three (two-hour) lectures are posted below, along with some links to relevant papers and resources.

Class 1: Introduction/Attacks

The PDF malware evasion attack is described in this paper:

Weilin Xu, Yanjun Qi, and David Evans. Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers. Network and Distributed System Security Symposium (NDSS). San Diego, CA. 21-24 February 2016. [PDF] [EvadeML.org]

Class 2: Defenses

This paper describes the feature squeezing framework:

Weilin Xu, David Evans, and Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. In 2018 Network and Distributed System Security Symposium. 18-21 February, San Diego, California. [PDF] [Project]
This paper introduces cost-sensitive robustness:
Xiao Zhang and David Evans. Cost-Sensitive Robustness against Adversarial Examples. In Seventh International Conference on Learning Representations (ICLR). New Orleans. May 2019. [arXiv] [OpenReview] [PDF]

Class 3: Privacy

This (free) book provides an introduction to secure multi-party computation:

David Evans, Vladimir Kolesnikov and Mike Rosulek. A Pragmatic Introduction to Secure Multi-Party Computation. NOW Publishers, December 2018. PDF

OblivC.org is an open-source tool for building secure multi-party computations from high-level (extended C) code.

This paper describes our work on integrating differential privacy and multi-party computation:

Bargav Jayaraman, Lingxiao Wang, David Evans and Quanquan Gu. Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization. In 32nd Conference on Neural Information Processing Systems (NeurIPS). Montreal, Canada. December 2018. [PDF] [Video Summary]

This paper summarizes our work on evaluating the privacy-utility tradeoffs for machine learning:

Bargav Jayaraman and David Evans. Evaluating Differentially Private Machine Learning in Practice. In 28th USENIX Security Symposium. Santa Clara. August 2019. [PDF] [arXiv] [code]


USENIX Security Symposium 2019

Bargav Jayaraman presented our paper on Evaluating Differentially Private Machine Learning in Practice at the 28th USENIX Security Symposium in Santa Clara, California.

Summary by Lea Kissner:

Also, great to see several UVA folks at the conference including: