Computer Science Colloquia
Monday, April 18, 2011
Advisor: Paul Reynolds Jr.
Attending Faculty: Andrew Grimshaw; abhi shelat; Doug Lea; Steven Boker
Rodman Room (Thornton Hall), 13:00:00
A Ph.D. Defense
Cache-Conscious Concurrent Data Structures
The power wall, the instruction-level parallelism wall, and the memory wall are driving a shift in microprocessor design from implicitly parallel architectures towards explicitly parallel architectures. A necessary condition for peak scalability and performance on modern hardware is application execution that is aware of the memory hierarchy. Cache-conscious concurrent data structures for many-core systems will show significant performance improvements over the state of the art in concurrent data structure designs for those applications that must contend with the deleterious effects of the memory wall. Lock-free cache-conscious data structures that maintain the abstraction of a linearizable set have been studied previously in the context of unordered data structures. We explore novel alternatives, namely lock-free cache-conscious data structures that maintain the abstraction of a linearizable ordered set. The two primary design contributions of this dissertation are the lock-free skip tree and lock-free burst trie algorithms. In both algorithms, read-only operations are wait-free and modification operations are lock-free. The lock-free skip tree has relaxed structural properties that allow atomic operations to modify the tree without invalidating the consistency of the data structure. We define the dense skip tree as a variation of the skip tree data structure, and prove cache-conscious properties of the dense skip tree. The proof techniques represent a
significant departure from the methods outlined in the original skip tree paper.
We show that cache-conscious, linearizable concurrent data structures have advantageous performance that can be measured across multiple architecture platforms. The improved performance arises from better treatment of the memory wall phenomenon that is ubiquitous to current multi-core systems and almost certainly will continue to affect future many-core systems. Using a series of synthetic benchmarks we have
shown that our lock-free skip tree and burst trie implementations perform up to x2.3 and x3.5 faster in read-dominated workloads on SPARC and x86 architectures, respectively, compared to the state of the art lock-free skip list. The minimum performance of the skip tree across all workloads and architectures is x0.87 relative to the skip list performance. An analysis of heap utilization of the data structures in the synthetic benchmark reveals the lock-free skip tree to use 59% of the heap utilization of the skip list and the lock-free burst trie to use 140% of the skip list heap utilization. In a series of four parallel branch-and-bound applications, two of the applications are x2.3 and x3.1 faster when using the skip tree as a concurrent priority queue as compared to the lock-free skip list. In a shared-memory supercomputer architecture the two branch-and-bound applications are x1.6 and x2.1 faster with the skip tree versus the skip list running at 80 hardware threads.