Shin, Donghwa (2018-12). An Anomaly Detection Framework for Heterogeneous and Streaming Data. Master's Thesis.
Anomaly detection has become one of the most important research areas due to its wide range of use such as abnormal behavior detection in network traffic, disease detection in MRI images, and fraud detection in credit card transactions. In many real-world anomaly detection problems, we face heterogeneous data comprising different types of attributes including categorical and continuous attributes. The heterogeneity of data makes it really difficult to compare data instances. Furthermore, the behaviors of data may change over time in streaming environments. Finally, it is hard to get the labels of data since we get too many data per day to manually classify them. To tackle these challenges, in the paper, we propose an anomaly detection framework for heterogeneous and streaming data. By introducing our own distance metric for categorical features and using an ensemble of two outlier detection methods, we effectively deal with both heterogeneous and streaming data. Furthermore, the ensemble model keeps updating its backend information during classification tasks so as to adapt to changing data behaviors. The framework, also, provides the interpretation of detected outliers in order to reduce the effort of human experts to get labeled data. Finally, we train a supervised machine learning algorithm using the feedback from human experts for anomaly detection tasks. Our experiment results show the efficacy of the proposed framework.