Stream Sequential Pattern Mining with Precise Error Bounds
Sequential pattern mining is an interesting data mining
problem with many real-world applications. This problem
has been studied extensively in static databases. However,
in recent years, emerging applications have introduced a
new form of data called data stream. In a data stream,
new elements are generated continuously. This poses additional
constraints on the methods used for mining such
data: memory usage is restricted, the infinitely flowing original
dataset cannot be scanned multiple times, and current
results should be available on demand.
This paper introduces two effective methods for mining
sequential patterns from data streams: the SS-BE method
and the SS-MB method. The proposed methods break the
stream into batches and only process each batch once. The
two methods use different pruning strategies that restrict
the memory usage but can still guarantee that all true sequential
patterns are output at the end of any batch. Both
algorithms scale linearly in execution time as the number
of sequences grows, making them effective methods for sequential
pattern mining in data streams. The experimental
results also show that our methods are very accurate in that
only a small fraction of the patterns that are output are false
positives. Even for these false positives, SS-BE guarantees
that their true support is above a pre-defined threshold
Date: December 15, 2008
Book Title: Int. Conf. on Data Mining (ICDM'08), Pisa, Italy, Dec. 2008
Type: Article
Downloads: 284
Has 1 soft copy
size 197171 bytesBibtex
@Article{Stream_Sequential_Pattern_Mining_with_Pr,
author = "Luiz F Mendes and Bolin Ding and Jiawei Han",
title = "{Stream Sequential Pattern Mining with Precise Error Bounds}",
month = "December",
year = "2008",
journal = "Int. Conf. on Data Mining (ICDM'08), Pisa, Italy, Dec. 2008",
}