Experiments on using Yahoo! categories to describe documents
We suggest that one (or a collection) of names of Yahoo! (or any other WWW indexer�s)
categories can be used to describe the content of a document. Such categories offer a standardized
and universal way for referring to or describing the nature of real world objects, activities,
documents and so on, and may be used (we suggest) to semantically characterize the content
of documents. WWW indices, like Yahoo! provide a huge hierarchy of categories (topics) that
touch every aspect of human endeavors. Such topics can be used as descriptors the way librarians
use for example, the Library of Congress cataloging system to annotate and categorize
books.
In the course of investigating this idea, we address the problem of automatic categorization
of webpages in the Yahoo! directory. We use Telltale as our classifier; Telltale uses n-grams to
compute the similarity between documents. We experiment with various types of descriptions
for the Yahoo! categories and the webpages to be categorized. Our findings suggest that the
best results occur when using the very brief descriptions of the Yahoo! categorized entries;
these brief descriptions, which are part of the Yahoo! index itself accompany most entries. We
discuss further research and ways to improve on the performance of our method.
Date: July 31, 1999
Book Title: Proceedings of the IJCAI99 Workshop on Intelligent Information Integration
Type: InProceedings
Downloads: 915
Has 1 soft copy
size 88478 bytesBibtex
@InProceedings{Experiments_on_using_Yahoo_categories_to,
author = "Yannis K Labrou and Tim Finin",
title = "{Experiments on using Yahoo! categories to describe documents}",
month = "July",
year = "1999",
booktitle = "Proceedings of the IJCAI99 Workshop on Intelligent Information Integration",
}