Unsupervised Anomaly Detection Based on Minimum Spanning Tree Approximated Distance Measures and its Application to Hydropower Turbines Academic Article uri icon

abstract

  • © 2004-2012 IEEE. Anomalies are data points or a cluster of data points that lie away from the neighboring points or clusters and are inconsistent with the overall pattern of the data. Anomaly detection techniques help distinguish the anomalous observations from the regular ones, and thus provide the basis for developing a standard performance guideline for process control. The process of identifying anomalies becomes complicated in the absence of labeled training data as in supervised learning. Moreover, Euclidean distance between two points is less likely able to reflect the intrinsic structural distance imposed by the underlying manifold structure. In this paper, the authors propose a minimum spanning tree (MST)-based anomaly detection method. The merit of the method is that an MST provides a new distance measure, capable of capturing the relative connectedness of data points/clusters in a complicated manifold, and could be a better (dis)similarity metric, than the simple Euclidean distance, to identify anomalies in unsupervised learning settings. The proposed method is compared with 13 popular anomaly detection methods on 20 benchmark data sets, demonstrating a considerable improvement in its ability of identifying anomalies. Furthermore, the MST-based anomaly detection is applied to the data set from a hydropower turbine and demonstrates remarkable detection competence. Note to Practitioners - This paper is motivated by the problem of unsupervised anomaly detection in a hydropower generation plant, which operates with turbine systems that are instrumented with dozens of sensors. Each turbine has subcomponents or functional areas such as several bearing systems, a generator, and so on. Sensors collect various types of data in real time such as temperature of oil inside the bearing systems, temperature of the bearings, ambient temperature, vibrations in each functional areas, a variety of harmonics in functional areas, temperature of the coil in the generator, and many more. In total, each turbine collects more than 200 attributes from its sensors. The sensor data are then stored in a control system and kept as time stamped historical data points. When a service/maintenance engineer suspects that there is a malfunction in a turbine, she/he extracts a data set from the control system that contains the collected sensor data for that turbine for the selected period of time (few weeks to few months), and then stores this data in a relational databases or simply in a comma separate value (csv) file for further analysis. The objective is to efficiently identify and isolate anomalies in the turbines. Toward this goal, we propose a new solution for tackling this challenging problem, which is an unsupervised method based on the concept of MST. The proposed method can be used as a competitive tool to aid the practitioners in their search of anomalies for making their systems better.

author list (cited authors)

  • Ahmed, I., Dagnino, A., & Ding, Y. u.

citation count

  • 7

publication date

  • July 2018