Computer Science Colloquia
Tuesday, May 3, 2011
Advisor: Jason Lawrence
Attending Faculty: Andrew Grimshaw
Olsson Hall, Room 236D, 24:15:00
A Master's Project Presentation
Early Experiences in Building and Using a Database of One Trillion Natural Image Patches
Many example-based image processing algorithms operate on image patches (small windows of pixels). Common examples include texture synthesis, resolution enhancement,image denoising, colorization, and hole-filling. One barrier to the widespread adoption and performance of these techniques is the lack of access to a large and varied collection of image patches. We describe a database of one trillion image patches assembled from one million natural images downloaded from the Internet. We also describe and analyze two systems for performing nearest neighbor searches over this database that use the parallel programming frameworks Hadoop and MPI, respectively. We demonstrate the utility of this database as a research tool by using it to investigate the fundamental relationship between the chosen patch size, the amount of training data, and the expected accuracy of the closest matches. We report a closed-form analytic expression that relates these three quantities, allowing any one to be predicted from the other two. Our findings show that massive databases are indeed necessary to achieve reliable performance for even moderate size patches and offer important and heretofore absent guidelines for practitioners and researchers interested in working with and improving these types of data-driven systems.