Computer Science Colloquia
Thursday, March 28, 2013
Advisors: Mary Lou Soffa & Jack Davidson
Attending Faculty: Kevin Skadron(Chair), Bruce Childers (University of Pittsburgh) and Benton Calhoun (Minor representative)
9:00 AM, Rice Hall, Room 242
PhD Proposal Presentation
Addressing Processor Over-provisioning on Contemporary Multi-core Platforms
Contemporary multi-core machines offer large numbers of cores by integrating multiple chip-multiprocessors (CMPs) into a single platform using non-uniform-memory-architectures (NUMAs). These machines can greatly boost the performance of multi-threaded programs by allowing the simultaneous execution of a massive number of threads. When using these machines, users tend to allocate all cores to one program, assuming that more cores translates to better performance. However, because of various scalability-limiting issues, such as memory bandwidth, many multi-threaded programs achieve best performance with fewer cores. As a result, over-allocating cores, or processor over-provisioning, may result in suboptimal performance, and may also lead to poor system utilization and high power consumption.
The proposed research addresses the processor over-provisioning problem by designing a run-time system that aims at dynamically predicting the optimal core allocation of a multi-threaded program, and automatically execute it with the predicted core allocation to achieve a performance similar to its optimal core allocation. Additionally, this run-time system meets the following requirements: low overhead, no required user-involvement, no required source code, and no required off-line profiling.
The proposed research consists of four parts. First, we will develop a model that predicts the optimal core allocation when scalability is limited by local memory bandwidth. This model will provide accurate predictions by precisely modeling the contention in DRAMs. Second, we will develop a model that predicts the optimal core allocation when scalability is limited by inter-node memory bandwidth. This model will provide accurate predictions by precisely modeling the contention among all inter-node and local memory accesses. Third, we will develop a dynamic reconfiguration solution that can efficiently adapt a multi-threaded program to any core allocation during execution. This solution will achieve high efficiency by providing mechanisms that allow online work repartitioning, low overhead synchronizations and context switches. And last, we will develop a run-time system that efficiently integrates the previous three components to predict the optimal core allocations for multi-threaded programs, and efficiently adapt the execution to predicted core allocations.
The success of this research will improve the performance of multi-threaded programs by automatically adjusting their core allocations to perform close to their optimal core allocations. The local and inter-node memory bandwidth models can also help identify and reduce memory contention. In addition, the dynamic reconfiguration solution may be used to perform online program adaption for other optimization purposes.