People/Web Search Calendar Emergency Info A-Z Index UVA Email University of Virginia

Computer Science Colloquia

Wednesday, October 24th
Muhammad Nur Yanhaona
Advisor: Andrew Grimshaw
Attending Faculty: Worthy Martin, chair; Paul Reynolds, Alfred Weaver and Marty Humphrey

10:00 AM, Rice Hall, Rm. 242

Ph.D. Qualifying Exam Presentation
An Agent-Based Distributed Monitoring Framework

ABSTRACT

In compute clusters, monitoring of infrastructure and application components is essential for performance assessment, failure detection, problem forecasting, better resource allocation, and several other reasons. It is often said that a good monitoring solution can make or break an entire service [10]. Scalable monitoring is a non-trivial problem as a good monitoring solution should have as little footprint as to be considered invisible while, at the same time, have a good response time. The demand for flexibility, both in terms of fault-tolerance and adaptability to configuration and environmental changes, puts additional burden over a monitoring solution. Traditional monitoring solutions are not well-equipped to deal with this multifaceted scalability and flexibility problem that will become more prominent in future as clusters are becoming larger and more diversified to accommodate the manifold demands of the era of data-intensive computation. Furthermore, efficient monitoring of virtual clusters, that are steadily gaining popularity, was never a concern for traditional monitoring solutions. Present day trends towards larger and more heterogeneous clusters, rise of virtual data-centers, and greater variability of usage suggest that we have to rethink how we do monitoring. We need solutions that will remain scalable in the face of unforeseen expansions, can work in a wide-range of environments, and be adaptable to changes of requirements. We have developed an agent-based framework for constructing such monitoring solutions. Our framework deals with all scalability and flexibility issues associated with monitoring, and leaves only the use-case specific task of data generation to specific solution. This separation of concerns provides a highly versatile design that enables a single monitoring solution to meet a wide range of requirements in a wide range of environments. This paper presents the design, implementation, and evaluation of our novel framework.