Spatio-temporal Analysis of HPC I/O and Connection Data Conference Paper uri icon

abstract

  • 2018 IEEE. The HPC system consists of a set of layers of software and hardware for I/O and networking. System logs are helpful resources to understand what is going on in the system. A challenge is that it is non-trivial to analyze the logs maintained in various levels of the stack. Independent analysis might lead to an incomplete conclusion due to the limited coverage of each log. This work takes a comprehensive approach to analysis that incorporates the logs in the multiple layers and components, in order to facilitate the detection of anomalous activities. This research aims to identify and predict potential performance bottlenecks in the HPC system, by capturing the temporal variation patterns from heterogeneous, high-dimensional, and non-linear log data. In this paper, we share our preliminary efforts for spatio-temporal analysis of HPC I/O and connection data, with our initial observations from the analysis of one-week HPC log data sets collected from one of NERSC systems.

name of conference

  • 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)

published proceedings

  • 2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS)

author list (cited authors)

  • Kim, J., Choi, J., & Sim, A.

citation count

  • 1

complete list of authors

  • Kim, Jinoh||Choi, Jinhwan||Sim, Alex

publication date

  • July 2018