Detecting Spam Blogs: A Machine Learning Approach
Weblogs or blogs are an important new way to publish
information, engage in discussions, and form communities
on the Internet. The Blogosphere has unfortunately
been infected by several varieties of spam-like
content. Blog search engines, for example, are inundated
by posts from splogs – false blogs with machine
generated or hijacked content whose sole purpose is to
host ads or raise the PageRank of target sites. We discuss
how SVM models based on local and link-based
features can be used to detect splogs. We present an
evaluation of learned models and their utility to blog
search engines; systems that employ techniques differing
from those of conventional web search engines. We
evaluate the effectiveness of a combination of features,
and finally report our informal analysis of a blog search
engine index.
Date: July 16, 2006
Book Title: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006)
Type: InProceedings
Publisher: University of Maryland, Baltimore County
Organization: Computer Science and Electrical Engineering
Downloads: 3547
Has 2 soft copies
size 82077 bytes
size 1727488 bytesBibtex
@InProceedings{Detecting_Spam_Blogs_A_Machine_Learning_,
author = "Pranam Kolari and Akshay Java and Tim Finin and Tim Oates and Anupam Joshi",
title = "{Detecting Spam Blogs: A Machine Learning Approach}",
month = "July",
year = "2006",
organization = " Computer Science and Electrical Engineering",
booktitle = "Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006)",
publisher = " University of Maryland, Baltimore County",
}