Sequential pattern mining in multi-databases via multiple alignment

abstract

To efficiently find global patterns from a multi-database, information in each local database must first be mined and summarized at the local level. Then only the summarized information is forwarded to the global mining process. However, conventional sequential pattern mining methods based on support cannot summarize the local information and is ineffective for global pattern mining from multiple data sources. In this paper, we present an alternative local mining approach for finding sequential patterns in the local databases of a multi-database. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. Approximate sequential patterns can effectively summerize and represent the local databases by identifying the underlying trends in the data. We present a novel algorithm, ApproxMAP, to mine approximate sequential patterns, called consensus patterns, from large sequence databases in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. We conduct an extensive and systematic performance study over synthetic and real data. The results demonstrate that ApproxMAP is effective and scalable in mining large sequence databases with long patterns. Hence, ApproxMAP can efficiently summarize a local database and reduce the cost for global mining. Furthremore, we present an elegant and uniform model to identify both high vote sequential patterns and exceptional sequential patterns from the collection of these consensus patterns from each local databases. 2005 Springer Science+Business Media, Inc.

authors

Kum, Hye Chung

published proceedings

DATA MINING AND KNOWLEDGE DISCOVERY

author list (cited authors)

Kum, H. C., Chang, J. H., & Wang, W.

citation count

30

complete list of authors

Kum, HC||Chang, JH||Wang, W

publication date

May 2006

publisher

Springer Nature Publisher

published in

Data Mining and Knowledge Discovery Journal

keywords

Approximate Sequential Pattern
Data Mining Algorithm
Global Sequential Pattern
Mining Local Pattern
Multiple Alignment
Sequential Patterns

Digital Object Identifier (DOI)

10.1007/s10618-005-0017-3

start page

151

end page

180

volume

12

issue

2-3

URL

http://dx.doi.org/10.1007/s10618-005-0017-3

Sequential pattern mining in multi-databases via multiple alignment Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL