Computer Science Colloquia
Tuesday, March 18, 2014
John "Jack" Wadden
Advisor: Kevin Skadron
Attending Faculty: Sudhanva Gurumurthi (Chair), Jack Davidson, and Worthy Martin
11:00 AM, Rice Hall, Rm. 504
Ph.D.Qualifying Exam Presentation
Real-World Design and Evaluation of Compiler-Managed GPU Redundant Multithreading
GPGPU reliability is fast becoming a weak link in the construction of reliable supercomputer systems. Because hardware protection is expensive to develop, requires dedicated on-chip resources, and is not portable across different architectures, the efficiency of software solutions, such as redundant multithreading (RMT), must be explored.
We present a real-world design and evaluation of automatic software RMT on GPU hardware. We first describe an LLVM compiler pass that automatically converts GPGPU kernels into redundantly threaded versions. We then perform a detailed power and performance evaluation of three RMT algorithms, each of which provides fault coverage to a set of structions in the GPU. Using real hardware, we show that compiler-managed, software RMT has highly variable costs. We further analyze the individual costs of redundant work scheduling, redundant computation, and inter-thread communication, showing that no single component in general is responsible for high overheads across all applications, but that there are certain workload properties that tend to cause RMT to perform well or poorly. Finally, we demonstrate the benefit of architectural support for RMT, with a specific example of fast, register-level thread communication.