Characterizing the Splogosphere
Weblogs or blogs collectively constitute the Blogosphere, forming
an influential and interesting subset on theWeb. As with
most Internet-enabled applications, the ease of content creation
and distribution makes the blogosphere spam prone.
Spam blogs or splogs are blogs hosting spam posts, created
using machine generated or hijacked content for the sole purpose
of hosting ads or raising the PageRank of target sites.
These splogs make up the splogosphere, and are now inundating
blog search engines and update ping servers. In this
work we characterize splogs by comparing them against authentic
blogs. Our analysis is based on a dataset made publicly
available by BlogPulse, and employs a machine learning
model that detects splogs with an accuracy of 90%. To
round off this analysis and to better understand splogs, we
also present our study of a popular blog update ping server,
and show how they are overwhelmed by pings sent by splogs.
This overall study will facilitate finding effective new techniques
to detect and weed out splogs from the blogosphere.
Date: May 23, 2006
Book Title: Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wid Web Conference
Type: InProceedings
Publisher: University of Maryland, Baltimore County
Organization: Computer Science and Electrical Engineering
Downloads: 2944
Has 2 soft copies
size 120035 bytes
size 3281920 bytesBibtex
@InProceedings{Characterizing_the_Splogosphere,
author = "Pranam Kolari and Akshay Java and Tim Finin",
title = "{Characterizing the Splogosphere}",
month = "May",
year = "2006",
organization = " Computer Science and Electrical Engineering",
booktitle = "Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wid Web Conference",
publisher = " University of Maryland, Baltimore County",
}