Title
Document clustering with universum
Abstract
Document clustering is a popular research topic, which aims to partition documents into groups of similar objects (i.e., clusters), and has been widely used in many applications such as automatic topic extraction, document organization and filtering. As a recently proposed concept, Universum is a collection of "non-examples" that do not belong to any concept/cluster of interest. This paper proposes a novel document clustering technique -- Document Clustering with Universum, which utilizes the Universum examples to improve the clustering performance. The intuition is that the Universum examples can serve as supervised information and help improve the performance of clustering, since they are known not belonging to any meaningful concepts/clusters in the target domain. In particular, a maximum margin clustering method is proposed to model both target examples and Universum examples for clustering. An extensive set of experiments is conducted to demonstrate the effectiveness and efficiency of the proposed algorithm.
Year
DOI
Venue
2011
10.1145/2009916.2010033
SIGIR
Keywords
Field
DocType
automatic topic extraction,document clustering,target domain,meaningful concept,document organization,universum example,clustering performance,novel document,popular research topic,proposed algorithm,clustering,algorithms
Fuzzy clustering,Data mining,Data stream clustering,Information retrieval,Correlation clustering,Document clustering,Computer science,Constrained clustering,Brown clustering,Cluster analysis,Single-linkage clustering
Conference
Citations 
PageRank 
References 
10
0.50
26
Authors
3
Name
Order
Citations
PageRank
Dan Zhang146122.17
Jingdong Wang24198156.76
Luo Si32498169.52