Computer Science Colloquia
Thursday, May 28, 2015
Advisor: Kevin Skadron
Attending Faculty: Worthy Martin (Chair); Gabriel Robins, Yanjun Qi
1:00 PM, Rice Hall, Rm. 242
PhD Qualifying Exam Presentation
Entity Resolution Acceleration using Automata Processor
Entity Resolution (ER), referring to finding identical entities within one database or across several databases, is of great importance for merging data from different data resources. Considerable research has been devoted to solving this problem; however, neither the performance nor the generality of current methods is satisfying. As the sizes of databases are growing fast in the big-data era, it becomes more computationally expensive to identify whether two records represent an identical entity. The complexity increases with increasing variation allowed. Micron's Automata Processor (AP), an efficient and scalable semiconductor architecture for parallel automata processing, provides a new opportunity for hardware acceleration for ER. We profile existing ER methods and find matching is current primary bottleneck. Therefore, we propose using the AP to accelerate matching and use a real-world application to illustrate how AP works. An AP-CPU heterogeneous computing framework is presented, accelerating the performance bottleneck of fuzzy matching for similar but potentially inexactly-matched names. The performance results show dramatic improvement over the existing CPU algorithm. $9.95$x to $1333$x speedups are gained for matching one name. We also use three different metrics to evaluate accuracy. AP shows better results for all the three metrics. Technology-scaling projections suggest that future generation of the AP will produce even better performance. In summary, the AP shows great potential for accelerating Entity Resolution.