SVMs for the Blogosphere: Blog Identification and Splog Detection
Weblogs, or blogs have become an important new way to publish
information, engage in discussions and form communities. The
increasing popularity of blogs has given rise to search and analysis
engines focusing on the 'blogosphere'. A key requirement of such
systems is to identify blogs as they crawl the Web.
While this ensures that only blogs are indexed, blog search engines
are also often overwhelmed by spam blogs (splogs). Splogs not only
incur computational overheads but also reduce user satisfaction. In
this paper we first describe our experiments on blog identification
using Support Vector Machines (SVM). We compare results of using
different feature sets and introduce new features for blog
identification. We then report preliminary results on splog detection
and identify future work.
Date: March 27, 2006
Book Title: AAAI Spring Symposium on Computational Approaches to Analysing Weblogs
Type: InProceedings
Publisher: University of Maryland, Baltimore County
Organization: Computer Science and Electrical Engineering
Note: Also available as technical report TR-CS-05-13
Downloads: 5483
Has 2 soft copies
size 100493 bytes
size 4669952 bytesBibtex
@InProceedings{SVMs_for_the_Blogosphere_Blog_Identifica,
author = "Pranam Kolari and Tim Finin and Anupam Joshi",
title = "{SVMs for the Blogosphere: Blog Identification and Splog Detection}",
month = "March",
year = "2006",
organization = "Computer Science and Electrical Engineering",
note = "Also available as technical report TR-CS-05-13",
booktitle = "AAAI Spring Symposium on Computational Approaches to Analysing Weblogs",
publisher = "University of Maryland, Baltimore County",
}