University of Jos Institutional Repository >
Natural Sciences >
Computer Science >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/2818
|
Title: | Understanding Error Log Event Sequence for Failure Analysis |
Authors: | Gurumdimma, Nentawe Bisandu, Desmond Bala |
Keywords: | Failure Sequences HPC Similarity Cluster |
Issue Date: | 2018 |
Publisher: | Science World Journal |
Series/Report no.: | Vol.13;No.4; Pp 8-15 |
Abstract: | Due to the evolvement of large-scale parallel systems, they are mostly employed for mission critical applications. The anticipation and accommodation of failure occurrences is crucial to the design. A commonplace feature of these large-scale systems is failure, and they cannot be treated as exception. The system state is mostly captured through the logs. The need for proper understanding of these error logs for failure analysis is extremely important. This is because the logs contain the “health” information of the system. In this paper we design an approach that seeks to find similarities in patterns of these logs events that leads to failures. Our experiment shows that several root causes of soft lockup failures could be traced through the logs. We capture the behavior of failure inducing patterns and realized that the logs pattern of failure and non-failure patterns are dissimilar. |
URI: | http://hdl.handle.net/123456789/2818 |
ISSN: | 1597-6343 |
Appears in Collections: | Computer Science
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|