Computer Science Colloquia
Wednesday, May 1, 2013
Advisor: Malathi Veeraraghavan
Attending Faculty: Jack Davidson (Chair), Andrew Grimshaw, Gabriel Robins, and Alfred Weaver
1:00 PM, Rice Hall, Rm. 242
PhD Qualifying Exam Presentation
On causes of GridFTP transfer throughput variance
In prior work, we analyzed GridFTP usage logs from data transfer nodes (DTN) at national scientific centers, and found significant throughput variance. The goal of this work is to quantify the impact of various factors on throughput variance. Our methodology consisted of executing experiments on a high-speed research testbed, running large-sized instrumented transfers between operational DTNs, and creating statistical models from collected measurements. A non-linear regression model for memory-to-memory transfer throughput as a function of CPU usage at the two DTNs and packet loss rate was created. The model is useful for determining concomitant resource allocations to use in scheduling requests. For example, if a whole NERSC DTN CPU can be assigned to the GridFTP process executing a large memory-to-memory transfer, then 32% of a CPU should be requested at the SLAC DTN for the corresponding GridFTP process, and a virtual circuit at 6.3 Gbps should be requested.