Computer Science Colloquia
Tuesday, April 10, 2012
Advisor: Alf Weaver
Attending Faculty: Bill Wulf, Chair; Sang Son, Jim French, and Ron Williams, minor representative
10:00 AM, Rice Hall, Rm. 242
Ph.D. Dissertation Presentation
Creating Deployable Relational Keyword Search Systems
The amount of information in the world is increasing exponentially. Keyword search has proven to be an effective method to discover and retrieve information online as evidenced by the success of Internet search engines. Unfortunately, many common information management systems do not support the familiar keyword search interface that people now expect. Web sites, corporations, and governments all use relational databases to manage information, but keyword search in relational databases is difficult due to data transformations that eliminate redundancy and ensure consistency. Relational keyword search enables users to retrieve information and to explore the relationships among that information all via a familiar interface.
Although a decade has passed since keyword search in databases became a hot topic for academic researchers, little progress has been made in the interim. In particular, no systems have appeared outside the academic community despite a long-standing promise to revolutionize the way people interact with information. This thesis addresses the challenges inherent in transitioning relational keyword search techniques from the computer science community to practical systems that can be deployed against existing data repositories. A key contribution of this research is an extensive benchmark specifically designed to evaluate relational keyword search techniques. Extensive empirical experiments both identify why existing search techniques cannot handle existing data repositories and identify areas for future research in this field. Improvements to relational keyword search come in the form of two novel ranking schemes that significantly improve search effectiveness. The first explicitly enforces users' preferences regarding the order of search results. The second uses machine learning to weight the various scoring factors that have been proposed to date in the literature, and analyzing their importance indicates a number of factors that can be excluded without sacrificing search effectiveness. This thesis also examines key issues related to the evaluation of proposed search techniques that derail many existing evaluations from accurately reflecting real-world retrieval tasks. This work bridges the gap between academic research and keyword search techniques that are ready to be deployed in real-world environments.