Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
Abstract—There is a huge wealth of sequence data available,
for example, customer purchase histories, program execution
traces, DNA, and protein sequences. Analyzing this wealth of
data to mine important knowledge is certainly a worthwhile goal.
In this paper, as a step forward to analyzing patterns in
sequences, we introduce the problem of mining closed repetitive
gapped subsequences and propose efficient solutions. Given a
database of sequences where each sequence is an ordered list
of events, the pattern we would like to mine is called repetitive
gapped subsequence, which is a subsequence (possibly with gaps
between two successive events within it) of some sequences in
the database. We introduce the concept of repetitive support
to measure how frequently a pattern repeats in the database.
Different from the sequential pattern mining problem, repetitive
support captures not only repetitions of a pattern in different
sequences but also the repetitions within a sequence. Given a userspecified
support threshold min sup, we study finding the set of all
patterns with repetitive support no less than min sup. To obtain
a compact yet complete result set and improve the efficiency, we
also study finding closed patterns. Efficient mining algorithms to
find the complete set of desired patterns are proposed based on
the idea of instance growth. Our performance study on various
datasets shows the efficiency of our approach. A case study is
also performed to show the utility of our approach.
Date: March 01, 2009
Book Title: Proc. 2009 Int. Conf. on Data Engineering (ICDE'09), Shanghai, China, Mar. 2009
Type: Proceedings
Downloads: 451
Has 1 soft copy
size 294666 bytesBibtex
@Proceedings{Efficient_Mining_of_Closed_Repetitive_Ga,
author = "Bolin Ding and David Lo and Jiawei Han and Siau-Cheng Khoo",
title = "{Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database}",
month = "March",
year = "2009",
booktitle = "Proc. 2009 Int. Conf. on Data Engineering (ICDE'09), Shanghai, China, Mar. 2009",
}