Blog Track Open Task: Spam Blog Classification
Spam blogs or Splogs are blogs created for the sole purpose of hosting
ads, promoting affiliate sites and getting new content indexed, with
auto-generated or plagiarized content from other sources. Spammers
equipped with readily available splog creation software inundate the
blogosphere both at ping servers, and at systems that index and
analyze blogs. Our own studies estimate these numbers to be around 75%
at ping servers and 20% at popular blog search engines. In this open
submission we hence propose Spam Blog Classification as a new task in
the Blog Track. Splogs are a specific instance of the more general
spam web-pages. While offline graph based mechanisms like TrustRank
are quite effective and sufficient for the Web, the blogosphere
demands new techniques. The quality of blog search engines is judged
not just by their reach, but also by their ability to index recent
(non-spam) posts. This requires that fast online splog
detection/filtering be used prior to indexing new content, followed by
offline techniques that exploit link graph anomalies. The nature of
this problem makes splog detection challenging. This open task
submission underscores the seriousness of the splog problem in the
TREC 2006 collection, details how it impacts the primary task of
Opinion Identification, and proposes multiple assessment and
evaluation approaches for a Spam Blog Classification task in Blog
Track 2007.
Date: November 14, 2006
Book Title: TREC 2006 Blog Track Notebook
Type: InCollection
Downloads: 2899
Has 2 soft copies
size 970777 bytes
size 1375232 bytesBibtex
@InCollection{Blog_Track_Open_Task_Spam_Blog_Classific,
author = "Pranam Kolari and Akshay Java and Tim Finin and James Mayfield and Anupam Joshi and Justin Martineau",
title = "{Blog Track Open Task: Spam Blog Classification}",
month = "November",
year = "2006",
booktitle = "TREC 2006 Blog Track Notebook",
}