RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis

As information networks become ubiquitous, extracting knowl- edge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without con- sidering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of au- thors) in one huge cluster without distinction is dull as well. In this paper, we address the problem of generating clusters for a speci¯ed type of objects, as well as ranking information for all types of objects based on these clusters in a multi- typed (i.e., heterogeneous) information network. A novel clustering framework called RankClus is proposed that di- rectly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vec- tor, where each dimension is a component coe±cient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, qual- ity of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive re- ¯nement process iterates until little change can be made. Our experiment results show that RankClus can generate more accurate clusters and in a more e±cient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.
Date: March 02, 2009
Book Title: Int. Conf. on Extending DataBase Technology (EDBT'09
Type: InProceedings
Edition: Proc. 2009
Address: Saint-Petersburg, Russia
Downloads: 485

Has 1 soft copy

remote link


  author = "Yizhou Sun and Jiawei Han and Peixiang Zhao and Zhijun Yin and Hong Cheng and Tianyi Wu",
  title = "{RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis}",
  month = "March",
  year = "2009",
  edition = "Proc. 2009",
  address = ", Saint-Petersburg, Russia",
  booktitle = "Int. Conf. on Extending DataBase Technology (EDBT'09",