Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

Frequent patterns provide solutions to datasets that do not have well-structured feature vectors. However, frequent pat- tern mining is non-trivial since the number of unique pat- terns is exponential but many are non-discriminative and correlated. Currently, frequent pattern mining is performed in two sequential steps: enumerating a set of frequent pat- terns, followed by feature selection. Although many meth- ods have been proposed in the past few years on how to perform each separate step efficiently, there is still limited success in eventually finding highly compact and discrimi- native patterns. The culprit is due to the inherent nature of this widely adopted two-step approach. This paper discusses these problems and proposes a new and different method. It builds a decision tree that partitions the data onto different nodes. Then at each node, it directly discovers a discrimi- native pattern to further divide its examples into purer sub- sets. Since the number of examples towards leaf level is relatively small, the new approach is able to examine pat- terns with extremely low global support that could not be enumerated on the whole dataset by the two-step method. The discovered feature vectors are more accurate on some of the most difficult graph as well as frequent itemset problems than most recently proposed algorithms but the total size is typically 50% or more smaller. Importantly, the minimum support of some discriminative patterns can be extremely low (e.g. 0.03%). In order to enumerate these low support patterns, state-of-the-art frequent pattern algorithm either cannot finish due to huge memory consumption or have to enumerate 101 to 103 times more patterns before they can even be found. Software and datasets are available by con- tacting the author.
Date: August 01, 2008
Book Title: Proc. 2008 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'08)
Type: Proceedings
Publisher: ACM
Downloads: 609

Has 1 soft copy


size 261519 bytes

Bibtex


@Proceedings{Direct_Mining_of_Discriminative_and_Esse,
  author = "Wei Fan and Kun Zhang and Jing Gao and Xifeng Yan and Jiawei Han and Philip S Yu and Olivier Verscheure",
  title = "{Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree}",
  month = "August",
  year = "2008",
  booktitle = "Proc. 2008 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'08)",
  publisher = "ACM",
}