RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis
As information networks become ubiquitous, extracting knowl-
edge from information networks has become an important
task. Both ranking and clustering can provide overall views
on information network data, and each has been a hot topic
by itself. However, ranking objects globally without con-
sidering which clusters they belong to often leads to dumb
results, e.g., ranking database and computer architecture
conferences together may not make much sense. Similarly,
clustering a huge number of objects (e.g., thousands of au-
thors) in one huge cluster without distinction is dull as well.
In this paper, we address the problem of generating clusters
for a speci¯ed type of objects, as well as ranking information
for all types of objects based on these clusters in a multi-
typed (i.e., heterogeneous) information network. A novel
clustering framework called RankClus is proposed that di-
rectly generates clusters integrated with ranking. Based on
initial K clusters, ranking is applied separately, which serves
as a good measure for each cluster. Then, we use a mixture
model to decompose each object into a K-dimensional vec-
tor, where each dimension is a component coe±cient with
respect to a cluster, which is measured by rank distribution.
Objects then are reassigned to the nearest cluster under the
new measure space to improve clustering. As a result, qual-
ity of clustering and ranking are mutually enhanced, which
means that the clusters are getting more accurate and the
ranking is getting more meaningful. Such a progressive re-
¯nement process iterates until little change can be made.
Our experiment results show that RankClus can generate
more accurate clusters and in a more e±cient way than the
state-of-the-art link-based clustering methods. Moreover,
the clustering results with ranks can provide more informative views of data compared with traditional clustering.
Date: March 02, 2009
Book Title: Int. Conf. on Extending DataBase Technology (EDBT'09
Type: InProceedings
Edition: Proc. 2009
Address: Saint-Petersburg, Russia
Downloads: 160
Has 1 soft copy
remote linkBibtex
@InProceedings{RankClus_Integrating_Clustering_with_Ran,
author = "Yizhou Sun and Jiawei Han and Peixiang Zhao and Zhijun Yin and Hong Cheng and Tianyi Wu",
title = "{RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis}",
month = "March",
year = "2009",
edition = "Proc. 2009",
address = ", Saint-Petersburg, Russia",
booktitle = "Int. Conf. on Extending DataBase Technology (EDBT'09",
}