Computer Science Colloquia
Tuesday, April 17, 2012
Advisor: Westley Weimer
Attending Faculty: Jack Davidson, Chair; Paul Reynolds, Kevin Sullivan and Ellen Bass.
2:00 PM, Rice Hall, Rm. 242
Ph.D. Dissertation Presentation
Automatically Describing Program Structure and Behavior
Powerful development environments, data-rich project management systems, and ubiquitous software frameworks have fundamentally altered the way software is constructed and maintained. Today, professional developers spend less than 10% of their time actually writing new code and instead primarily try to understand existing software. Increasingly, programmers work by searching for examples or other documentation and then assembling pre-constructed components. Yet, code comprehension is poorly understood and documentation is often incomplete, incorrect, or unavailable. Moreover, few tools exist to support structured code search making the process of finding useful examples ad hoc, slow and error prone.
The overarching goal of this research is to help humans better understand software at many levels and in doing so improve development productivity and software quality. The contributions of this work are in three areas: readability, runtime behavior, and documentation.
We introduce the first readability metric for program source code. Our model for readability, which is internally based on logistic regression, was learned from data gathered from 120 study participants. Our model agrees with human annotators as much as they agree with each other.
Beyond surface-level readability, understanding software demands understanding the behavior of the running program. To that end, we contribute a model describing runtime behavior in terms of program paths. Over several benchmarks, the top 5% of paths as ranked by our algorithm account for over half of program runtime.
Finally, we describe automated approaches for improving the understandability of software through documentation synthesis. The key to our approach is adapting programming language techniques (e.g., symbolic execution) to create output that is directly comparable to existing human-written artifacts. We describe and evaluate documentation synthesis algorithms for exceptions, code changes, and APIs.
Increasingly, software comprehension is a critical factor in modern development. However, software remains difficult to understand. This research seeks to improve that state by automatically describing program structure and behavior.