Our research seeks to empower individuals and organizations to control how their data is used. We use techniques from cryptography, programming languages, machine learning, operating systems, and other areas to both understand and improve the security of computing as practiced today, and as envisioned in the future.

Everyone is welcome at our research group meetings (most Fridays at 11am, but join the slack group for announcements). To get announcements, join our Slack Group (any @virginia.edu email address can join themsleves, or email me to request an invitation).


Adversarial Machine Learning

Secure Multi-Party Computation
Obliv-C · MightBeEvil

Recent Posts

NeurIPS 2019

Here's a video of Xiao Zhang's presentation at NeurIPS 2019:
https://slideslive.com/38921718/track-2-session-1 (starting at 26:50)

See this post for info on the paper.

Here are a few pictures from NeurIPS 2019 (by Sicheng Zhu and Mohammad Mahmoody):

USENIX Security 2020: Hybrid Batch Attacks

Finding Black-box Adversarial Examples with Limited Queries

Black-box attacks generate adversarial examples (AEs) against deep neural networks with only API access to the victim model.

Existing black-box attacks can be grouped into two main categories:

  • Transfer Attacks use white-box attacks on local models to find candidate adversarial examples that transfer to the target model.

  • Optimization Attacks use queries to the target model and apply optimization techniques to search for adversarial examples.

Hybrid Attack

We propose a hybrid attack that combines transfer and optimization attacks:

  1. Transfer Attack → Optimization Attack — take candidate adversarial examples of the local models of transfer attacks as the starting points for optimization attacks.

  2. Optimization Attack → Transfer Attack — intermediate query results from the optimization attacks are used to fine-tune the local models of transfer attacks.

We validate effectiveness of the hybrid attack over the baseline on three benchmark datasets: MNIST, CIFAR10, ImageNet. In this post, we only show the results of AutoZOOM as the selected optimization method. More results of other attacks can be found in the paper.

Local Adversarial Examples are Useful (Transfer → Optimization)

Below, we compare the performance of AutoZOOM attack when it starts from 1) the local adversarial examples, and 2) the original points. Here, we report results for targeted attacks on normal (i.e., non-robust) models:

Local AEs can substantially boost the performance of optimization attacks, but when the same attack is used against robust models, the improvement is small:

This ineffectiveness appears to stem from differences in the attack space of normal and robust models. Therefore, to improve effectiveness against robust target model, we use robust local models to produce the transfer candidates for starting the optimization attacks. The figure below compares impact of normal and robust local models when attacking the robust target model:

Tuning with Byproduces Doesn’t Help Much (Optimization → Transfer)

Below, we compare the performance of AutoZOOM attack on MNIST normal model when the local models are 1) fine-tuned during the attack process, and 2) kept static:

Tuining local models using byproducts from the optimization attack improves the query efficiency. However, for more complex datasets (e.g., CIFAR10), we observe degradation in the attack performance by fine-tuning (check Table 6 in the paper).

Batch Attacks

We consider a batch attack scenario: adversaries have limited number of queries and want to maximize the number of adversarial examples found within the limit. This is a more realistic way to evaluate attacks for most adversarial purposes, then just looking at the average cost to attack each seed in a large pool of seeds.

The number of queries required for attacking a specific seed varies greatly across seeds:

Based on this observation, we propose two-phase strategy to prioritize easy seeds for the hybrid attack:

  1. In the first phase, the likely-to-transfer seeds are prioritized based on their PGD-steps taken to attack the local models. The candidate adversarial example for seed seed is attempted in order to find all the direct transfers.

  2. In the second phase, the remaining seeds are prioritized based on their target loss value with respect to the target model.

To validate effectievness of the two-phase strategy, we compare to two seed prioritization strategies:

  • Retroactive Optimal: a non-realizable attack that assumes adversaries already know the exact number of queries to attack each seed (before the attack starts) and can prioritize seeds by their actual query cost. This provides an lower bound on the query cost for an optimal strategy.

  • Random: this is a baseline strategy where seeds are prioritized in random order (this is the stragety assumed in most works where the adverage costs are reported).

Results for the AutoZOOM attack on a normal ImageNet model are shown below:

Our two-phase strategy performs closely to the retroactive optimal strategy and outpeforms random baseline significantly: with same number of query limit, two-phase strategy finds significantly more adversarial examples comapred to the random baseline, and is closer to the retroactive optimal case. (See the paper for more experimental results and variations on the prioritization strategy.)

Main Takeaways

  • Transfer → Optimization: local adversarial examples can generally be used to boost optimization attacks. One caveat is, against robust target model, hybrid attack is more effective with robust local models.

  • Transfer → Optimization: fine-tuning local models is only helpful for small scale dataset (e.g., MNIST) and fails to generalize to more complex datasets. It is an open question whether we can make the fine-tuning process work for complex datasets.

  • Prioritizing seeds based on two-phase strategy for the hybrid attack can significantly improve its query efficiency in batch attack scenario.

Our results make the case that it is important to evaluate both attacks and defenses with a more realistic adversary model than just looking at the average cost to attack a seed over a large pool of seeds. When an adversary only need to find a small number of adversarial examples, and has access to a large pool of potential seeds to attack (of equal value to the adversary), then the effective costs of a successful attack can be orders of magnitude lower than what would be projected assuming an adversary who cannot prioritize seeds to attack.


Fnu Suya, Jianfeng Chi, David Evans and Yuan Tian. Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries. In USENIX Security 2020. Boston, August 2020. [PDF] [arXiv]



In this repository, we provide the source code to reproduce the results in the paper. In addition, we believe our hybrid attack framework can (potentially) help boost the performance of new optimization attacks. Therefore, in the repository, we also provide tutorials to incorporate new optimization attacks into the hybrid attack framework.

NeurIPS 2019: Empirically Measuring Concentration

Xiao Zhang will present our work (with Saeed Mahloujifar and Mohamood Mahmoody) as a spotlight at NeurIPS 2019, Vancouver, 10 December 2019.

Recent theoretical results, starting with Gilmer et al.’s Adversarial Spheres (2018), show that if inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable.c The key insight from this line of research is that concentration of measure gives lower bound on adversarial risk for a large collection of classifiers (e.g. imperfect classifiers with risk at least $\alpha$), which further implies the impossibility results for robust learning against adversarial examples.

However, it is not clear whether these theoretical results apply to actual distributions such as images. This work presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. More specifically, we prove that by simultaneously increasing the sample size and a complexity parameter of the selected collection of subsets $\mathcal{G}$, the concentration of the empirical measure based on samples converges to the actual concentration asymptotically.

To solve the empirical concentration problem, we propose heuristic algorithms to find error regions with small expansion under both $\ell_\infty$ and $\ell_2$ metrics.

For instance, our algorithm for $\ell_\infty$ starts by sorting the dataset based on the empirical density estimated using k-nearest neighbor, and then obtains $T$ rectangular data clusters by performing k-means clustering on the top-$q$ densest images. After expanding each of the rectangles by $\epsilon$, the error region $\mathcal{E}$ is then specified as the complement of the expanded rectangles (the reddish region in the following figure). Finally, we search for the best error region by tuning the number of rectangles $T$ and the initial coverage percentile $q$.

Based on the proposed algorithm, we empirically measure the concentration for image benchmarks, such as MNIST and CIFAR-10. Compared with state-of-the-art robustly trained models, our estimated bound shows that, for most settings, there exists a large gap between the robust error achieved by the best current models and the theoretical limits implied by concentration.

This suggests the concentration of measure is not the only reason behind the vulnerability of existing classifiers to adversarial perturbations. Thus, either there is room for improving the robustness of image classifiers or a need for deeper understanding of the reasons for the gap between intrinsic robustness and the actual robustness achieved by robust models.


Saeed Mahloujifar, Xiao Zhang, Mohamood Mahmoody and David Evans. Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness. In NeurIPS 2019 (spotlight presentation). Vancouver, December 2019. [PDF] [arXiv]



Jobs for Humans, 2029-2059

I was honored to particilate in a panel at an event on Adult Education in the Age of Artificial Intelligence that was run by The Great Courses as a fundraiser for the Academy of Hope, an adult public charter school in Washington, D.C.

I spoke first, following a few introductory talks, and was followed by Nicole Smith and Ellen Scully-Russ, and a keynote from Dexter Manley, Super Bowl winner with the Washington Redskins. After a short break, Kavitha Cardoza moderated a very interesting panel discussion. A recording of the talk and rest of the event is supposed to be available to Great Courses Plus subscribers. I’ve included a fairly complete text from the script I wrote (with some modifications and additional comments).

As a tenured professor at a well-endowed university, I have the good fortune to be about as sheltered as anyone can be from disruptions in employment. But, I do have two young children, so I have a strong personal interest in there being good jobs available for them in this time period.

My four-year old son drew the picture, and I hate to disappoint him that the train driver position he dreams of probably won’t work out, and I don’t think there is anything we can do to save it. But, I do hope there will be something fulfilling for him to do when he grows up.

For the actual talk, I was able to use beautiful images from Getty Images which The Great Courses has a license to (and not allowed to use Create Commons images, since they do not count as non-commercial), so I've replaced the images from the talk with CC-licensed images. Unfortunately, I wasn't able to find a cc image of a match making factory, like the one in this Getty image that I used in the talk. The replacement image is a spinning room from Fall River, MA, 1912.

I want to start by talking about history. Machines taking jobs away from humans is not a new thing - goes back hundreds of years.

The picture shows a match-making factory in the 1870s. No one has a job as this kind of match-maker today – and lots of the jobs the other kind of match-maker have also been taken over by machines.

Another example is automated teller machine. ATMs automated many of the roles previously done by human bank tellers.

ATM Machines didn’t eliminate jobs for bank tellers – actually made more, since they make banking cheaper and more accessible, and meant banks could open more branches which still needed to hire tellers for the more interesting and complicated banking activities.

We’ve seen many transformations like this, and many tasks that could once only be done by humans are now routinely automated. Although past technological advances have caused massive disruption and pain for many individuals, as a species on the whole, these advances have not diminished overall human employment, and there shouldn’t be any doubt that on the whole, technological and scientific program have made all of our lives tremendously better.

Graph based on data from Our World in Data.

One way to measure that progress is what fraction of our workforce is needed to feed us.

As recently as 150 years ago, nearly everyone worked in agriculture, and we still couldn’t produce enough food to feed everyone. Now, only about 1 out of 75 people work in farming, and we are so ridiculously productive in producing food with we can burn about 1/3 of it as fuel.

So, if we were driven as a species to avoid work once our subsistence needs are met, we shouldn’t be working 8 hours or more a day.

If the ordinary wage-earner worked four hours a day, there would be enough for everybody and no unemployment... This idea shocks the well-to-do, because they are convinced that the poor would not know how to use so much leisure.
Bertrand Russell, In Praise of Idleness (1932)

We should be working about 8 minutes a day, and still living better than people did 150 years ago.

Somehow we’ve managed improve productivity by a factor of several hundred, without reducing the demand for human work. All the previous scientific and technological progress that has automated work has just led to finding new productive things for humans to do.

Will Artificial Intelligence Change Everything?

The question for today is if we are reaching the end of that – are the kinds of automation that can be achieved (or soon will be) with artificial intelligence that they won’t just improve productivity by automating some human tasks, but that they will eliminate the opportunity for humans to contribute productively at all.

First, its important to distinguish AI from the traditional automation that has driven productivity gains for the past hundred years.

We’ve only experienced the disruption computing causes over the past half century, but a few people had the vision of what was possible much longer ago.

The first person to see the potential for universal computers, was Gottfried Leibniz back in the 17th century.

The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.
Gottfried Wilhelm Leibniz (1679)

Leibniz is talking about increasing the power of our minds the way a telescope increases the power of our eyes. This is what traditional computing has done.

More recently, Steve Jobs talked about computers as bicycles for the mind.

We have machines now that can do any computation we understand well enough to describe in simple steps quadrillions of times faster than humans can do it. We can also build computing systems, like the one most of us have in our pockets, that don’t just do one one human understands faster, but that can take the collective efforts of millions of humans over decades to make it possible for us to play “Angry Birds”.

This is an amazing accomplishment for our species, but AI goes beyond it.

All of the previous capabilities can only solve problems humans already understand well enough to solve.

AI allows us to solve problems no humans understand.

I’m going to focus now on Machine Learning, which is just one sub-field of AI, but it’s the one that has the most hype over the past decade, and one that is being rapidly deployed in ways that automate human tasks.

The main distinction between traditional computing and machine learning is that traditional automatons are programmed, while machine learning automatons are trained. Humans set up a training process, but the machine learns how to solve the problem on its own.

I’ll illustrate with a simple example – building a machine to distinguish women and men in pictures.

This is something most humans are quite good at, but if I asked you how you do it, you couldn’t explain it. And, since humans can’t explain how to do it, we don’t know how to write a program to do it, but we can train a model that does it well.

First, we need to collect training images.

Training complex models requires a lot of training data – a few million images. Fortunately, you can find billions of images on the Internet (at least if you don't care about copyright; for the talk, I was allowed to use these images, but they blurred them out of the recording).

Next, we need to label the images. This is what is called “supervised learning” – we are telling the machine the correct answers, and it is training to find a model that matches those answers.

Labeling 10M images might seem like a lot of work, but you can find low-paid humans to do it.

Now, we’re ready to train a model. But first, we need to decide on a model architecture.

This just means designing a function with lots of unknown parameters that takes all the pixels of an images as its inputs, and outputs a prediction of gender and confidence.

We train the model by starting with random values for all those parameters, testing the labeled images, and updating the weights until we get a model that makes good predictions.

There are some tricks for how to update the weights in the right direction, but if all goes well, we’ll find a model that has high accuracy on our tests before we run out of money.

Then we deploy the model and everything is great...

...except we see results like these.

We have people that look like men, but are identified as women. What they have in common is they are carrying umbrellas.

These examples were provided by Tianlu Wang, and the results showing how ML models learn biases (and a method to mitigate this) are in Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints, Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang. (In Empirical Methods in Natural Language Processing (EMNLP) 2017.)

The umbrella-revealing-gender example is from Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations by Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez. (In International Conference on Computer Vision (ICCV), 2019.)

The problem is the learning process is just about learning statistical patterns in data – the model is not developing any real understanding of men and women.

Since most of the people in the labeled data who were holding umbrellas were women, the model learned a strong pattern that if you are holding an umbrella you must be a woman.

Being mis-gendered because of holding an umbrella may not be such a serious problem, but the same kinds of learning methods are used in many more sensitive tasks, like predicting who is a terrorist and who gets a job interview.

Amazon Created a Hiring Tool Using A.I. It Immediately Started Discriminating Against Women. (Slate, Oct 2018)

If you are traveling in Hong Kong, being face-recognized with an umbrella nearby may lead to other problems.

Predicting the Future

The wisest quote I know of about making predictions is from English footballer Paul Gascoigne, who said "I never predict anything, and I never will."

The hard thing about making predictions, is that our brains and the experiences we relate to are linear – we get one year older every year, and can relate to linear change easily.

But, nearly everything we care about is actually changes exponentially.

People tend to talk about exponential change as though it is rare and special, but it is actually the way nearly everything that matters changes.

All it means to be exponential, is that the change is a percentage of the current value, like increasing by 5% a year or doubling every decade.

Here’s what doubling every year looks like for 10 years (compared to the blue line which is adding 100 every year). Looks about the same, until we go out another year...

After 20 years, the linear line is so overwhelmed by the exponential growth, that it is indistinguishable from the flat axis. In another 10 years, the doubling rate has produced another factor of 1000.

Almost everything we care about looks like this – here we see GDP per capita for a handful of countries.

The exponential growth is so powerful, that World War II looks like a little glitch.

We see similar exponential curves for almost anything we look at – here it is books published per person, and crop yield per acre.

Predicting exponential growth is the easiest and safest prediction to make. The only way it goes wrong is if there is some physical limit that stops the growth – Malthus thought we were at the limit of how much an acre of land could produce in 1800 – but you can see from the graph on the right that we weren’t, and this is one of the reasons he got things so wrong.

The main challenge isn’t predicting the exponential growth, it is guess how much more growth is needed for things to work – that is, what are the labels on the vertical axis.

One property of exponential curves, is if you zoom in on any part of them, they look basically the same.

When autonomous vehicles started to work in 2005, if you thought we were on a curve like this and only needed to get about 100x better, you were at risk of making really bad predictions.

If you saw the rate of improvement in autonomous vehicles from 2004 to 2016 and extrapolated continued exponential growth, maybe it wasn't so unreasonable to predict in 2016 that we would have coast-to-coast full autonomy by 2017.

Of course, no we know it was a lot further off.

Even though its easy to predict exponential change, it is hard to predict looking forward how much more change is needed to get to the point where something actually works well enough.

Predicting dropping costs is easier. The red curve is also exponential, but here, it is decreasing by 7% a year instead of increasing.

For example, let’s look at the cost of communicating.

If you wanted to send a message from Washington to California in 1800, you couldn’t even imagine doing it.

By 1803, one person could – Thomas Jefferson envisioned the Lewis and Clark expedition.

It cost \$50,000 in 1803 dollars, which is hard to convert, but let’s guess about \$100M.

Congressional appropriation methods, however, haven't changed much - the initial request, shown in the letter, was for $2,500.

Let’s zoom in on the next 100 years.

By 1903, we had telephones. If you were really rich and important you could have one in your office.

A 3-minute call from NY to Chicago cost \$5.45 in 1903 dollars.

I couldn’t find out if you could call from Washington to California, and adjusting for inflation is tough, but somewhere around a few $1000 seems reasonable.

Why was it so expensive? Here’s the phone operators that were needed to make it happen. (Those jobs don’t exist today either.)

By 2003, we had early smartphones.

Today, no one thinks twice about the cost of sending messages to California.

Sending messages has gotten so ridiculously cheap, spam is profitable.

One way to think about the jobs of the future is to predict when the cost of automating the job will drop below the cost of hiring a human to do it today.

Nearly all individual jobs could be automated today, if you invested enough in a special-purpose solution and in training the machine to do that specific job.

Jobs that Won’t be Replaced by AI

It is hard to predict when particular jobs will be automated out of existence, but it seems eventually most jobs people have today will be.

I think there are three types of jobs that won’t be – the first two will be a bit depressing, but don’t give up hope for a hopeful conclusion until I get to the third one.

An example of a profession that is well organized to resist job losses due to technology is medicine. When Johnson & Johnson marketed a machine for performing anesthesia, the American Society of Anesthesiologists objected.

The medical professional organizations are so powerful, that nearly all medical technology is designed and marketed to assist doctors, not replace them.

Amazon Fulfillment Center, Richmond, VA
(picture by me)

It is well worth taking a tour of an Amazon Fulfillment Center. Other than conveyer belts, scanners, and fork lifts, there are not many many machines operating the warehouse — nearly all the work is done by humans, being told by software what to do.

Its cheaper to get humans to do this work since it is physically complex, varied, and high skill, but the skills required to do it are readily available and not highly valued by the marketplace, so humans with such skills can be hired at low cost and easily replaced when necessary. It'll be a long time before it is cheaper to do these jobs by machine.

We don’t watch sports to see the fastest, strongest, most agile objects in the universe - we want to be inspired by the individual achievements of other humans, and the synergy of a team of humans working together.

Unfortunately for me, and most of us in this room (unlike Dexter Manley, pictured, who was our keynote speaker), we don’t have the athletic talents to succeed as professional athletes.

But this intrinsic value in work being done by humans applies to many other endeavors.

Machines can already produce music that musicologists can’t distinguish from humans — but no one wants to listen to it.

We want to listen to music to connect with the humans (or at least animals!) who wrote and performed it, and it matters who they are.

Image Credit: hellotimothytyndale

Burger flippers can be replaced by machines once the costs get low enough, but not high-end chefs — food tastes more savory when we can imagine it being cooked by a perfectionist chef lording over a kitchen.

White House Photo (Public Domain)

Education works best when the teacher is a human who cares about what she is teaching, and best of all when the teacher also cares about the students as people.

## Hopeful Conclusion

In the future, we should all have jobs like these!

Everyone should have a job that values their intrinsic humanity, and technology will soon advance to the point where we all can. In many ways (recall the diminishing percentage of our workforce employed in agriculture from earlier) we already have. We just have to face the challenge of restructuring society to make this work.

Research Symposium Posters

Five students from our group presented posters at the department’s Fall Research Symposium:

Anshuman Suri’s Overview Talk

Bargav Jayaraman, Evaluating Differentially Private Machine Learning In Practice [Poster]
[Paper (USENIX Security 2019)]

Hannah Chen [Poster]

Xiao Zhang [Poster]
Paper (NeurIPS 2019)]

Mainudding Jonas [Poster]

Fnu Suya [Poster]
Paper (USENIX Security 2020)]