SVMs for the Blogosphere: Blog Identification and Splog Detection

Weblogs, or blogs have become an important new way to publish information, engage in discussions and form communities. The increasing popularity of blogs has given rise to search and analysis engines focusing on the 'blogosphere'. A key requirement of such systems is to identify blogs as they crawl the Web. While this ensures that only blogs are indexed, blog search engines are also often overwhelmed by spam blogs (splogs). Splogs not only incur computational overheads but also reduce user satisfaction. In this paper we first describe our experiments on blog identification using Support Vector Machines (SVM). We compare results of using different feature sets and introduce new features for blog identification. We then report preliminary results on splog detection and identify future work.
Date: March 27, 2006
Book Title: AAAI Spring Symposium on Computational Approaches to Analysing Weblogs
Type: InProceedings
Publisher: University of Maryland, Baltimore County
Organization: Computer Science and Electrical Engineering
Note: Also available as technical report TR-CS-05-13
Google scholar: EGVbfbEUYT4J
Google citations: 43 citations
Downloads: 6150

Has 2 soft copies


size 100493 bytes

size 4669952 bytes

Bibtex


@InProceedings{SVMs_for_the_Blogosphere_Blog_Identifica,
  author = "Pranam Kolari and Tim Finin and Anupam Joshi",
  title = "{SVMs for the Blogosphere: Blog Identification and Splog Detection}",
  month = "March",
  year = "2006",
  organization = "Computer Science and Electrical Engineering",
  note = "Also available as technical report TR-CS-05-13",
  booktitle = "AAAI Spring Symposium on Computational Approaches to Analysing Weblogs",
  publisher = "University of Maryland, Baltimore County",
}