Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree
Frequent patterns provide solutions to datasets that do not
have well-structured feature vectors. However, frequent pat-
tern mining is non-trivial since the number of unique pat-
terns is exponential but many are non-discriminative and
correlated. Currently, frequent pattern mining is performed
in two sequential steps: enumerating a set of frequent pat-
terns, followed by feature selection. Although many meth-
ods have been proposed in the past few years on how to
perform each separate step efficiently, there is still limited
success in eventually finding highly compact and discrimi-
native patterns. The culprit is due to the inherent nature of
this widely adopted two-step approach. This paper discusses
these problems and proposes a new and different method. It
builds a decision tree that partitions the data onto different
nodes. Then at each node, it directly discovers a discrimi-
native pattern to further divide its examples into purer sub-
sets. Since the number of examples towards leaf level is
relatively small, the new approach is able to examine pat-
terns with extremely low global support that could not be
enumerated on the whole dataset by the two-step method.
The discovered feature vectors are more accurate on some of
the most difficult graph as well as frequent itemset problems
than most recently proposed algorithms but the total size is
typically 50% or more smaller. Importantly, the minimum
support of some discriminative patterns can be extremely
low (e.g. 0.03%). In order to enumerate these low support
patterns, state-of-the-art frequent pattern algorithm either
cannot finish due to huge memory consumption or have to
enumerate 101 to 103 times more patterns before they can
even be found. Software and datasets are available by con-
tacting the author.
Date: November 30, 2008
Book Title: ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'08), Las Vegas, NV
Type: Article
Address: Las Vegas, Nevada, USA
Downloads: 218
Has 1 soft copy
remote linkBibtex
@Article{Direct_Mining_of_Discriminative_and_Esse,
author = "Wei Fan and Kun Zhang and Hong Cheng and Jing Gao and Xifeng Yan and Jiawei Han and Philip S Yu and Olivier Verscheure",
title = "{Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree}",
month = "November",
year = "2008",
address = ", Las Vegas, Nevada, USA",
journal = "ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'08), Las Vegas, NV",
}