Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network
Abstract—Inner product measures how closely two feature
vectors are related. It is an important primitive for many popular
data mining tasks, e.g., clustering, classification, correlation computation,
and decision tree construction. If the entire data set is
available at a single site, then computing the inner product matrix
and identifying the top (in terms of magnitude) entries is trivial.
However, in many real-world scenarios, data is distributed across
many locations and transmitting the data to a central server
would be quite communication-intensive and not scalable. This
paper presents an approximate local algorithm for identifying
top-l inner products among pairs of feature vectors in a large
asynchronous distributed environment such as a peer-to-peer
(P2P) network. We develop a probabilistic algorithm for this
purpose using order statistics and Hoeffding bound. We present
experimental results to show the effectiveness and scalability
of the algorithm. Finally, we demonstrate an application of
this technique for interest-based community formation in a P2P
environment.
Date: April 02, 2008
Book Title: IEEE Transactions on Knowledge and Data Engineering.
Type: Article
Series: 4
Volume: 20
Pages: 475-488
Downloads: 164
Has 1 soft copy
size 516373 bytesBibtex
@Article{Distributed_Identification_of_Top_l_Inne,
author = "Hillol Kargupta and Kun Liu and Kamalika Das and Kanishka Bhaduri",
title = "{Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network}",
month = "April",
year = "2008",
series = "4",
pages = "475-488",
volume = "20",
journal = "IEEE Transactions on Knowledge and Data Engineering.",
}