Comparative study of sequential pattern mining models Chapter uri icon

abstract

  • The process of finding interesting, novel, and useful patterns from data is now commonly known as Knowledge Discovery and Data mining (KDD). In this paper, we examine closely the problem of mining sequential patterns and propose a general evaluation method to assess the quality of the mined results. We propose four evaluation criteria, namely (1) recoverability, (2) the number of spurious patterns (3) the number of redundant patterns, and (4) the degree of extraneous items in the patterns, to quantitatively assess the quality of the mined results from a wide variety of synthetic datasets with varying randomness and noise levels. Recoverability, a new metric, measures how much of the underlying trend has been detected. Such an evaluation method provides a basis for comparing different models for sequential pattern mining. Furthermore, such evaluation is essential in understanding the performance of approximate solutions. In this paper, the method is employed to conduct a detailed comparison of the traditional frequent sequential pattern model with an alternative approximate pattern model based on sequence alignment. We demonstrate that the alternative approach is able to better recover the underlying patterns with little confounding information under all circumstances we examined, including those where the frequent sequential pattern model fails.

author list (cited authors)

  • Kum, H. C., Paulsen, S., & Wang, W.

editor list (cited editors)

  • Lin, T. Y., Ohsuga, S., Liau, C. J., Hu, X., & Tsumoto, S.

Book Title

  • Foundations of Data Mining and Knowledge Discovery

publication date

  • September 2005