Title
Co-Training Based Semi-Supervised Web Spam Detection
Abstract
Traditional Web spam classifiers use only labeled data (feature/label pairs) to train. Labeled spam instances, however, are very difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled samples are relatively easy to collect. Semi-supervised learning addresses the classification problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. This paper proposes two new semi-supervised learning algorithms to boost the performance of Web spam classifiers. The algorithms integrate the traditional co-training with the topological dependency based hyperlink learning. The proposed methods extend our previous work on self-training based semi-supervised Web spam detection. The experimental results with 100/200 labeled samples on the standard WEBSPAM-UK2006 benchmark showed that the algorithms are effective.
Year
DOI
Venue
2013
10.1109/FSKD.2013.6816301
2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD)
Keywords
Field
DocType
prediction algorithms,internet,feature extraction,information retrieval
Data mining,Computer science,Co-training,Feature extraction,Prediction algorithms,Artificial intelligence,Hyperlink,Labeled data,Machine learning,Spamdexing
Conference
Citations 
PageRank 
References 
1
0.35
9
Authors
4
Name
Order
Citations
PageRank
Wei Wang111.36
Xiaodong Lee24510.43
An-Lei Hu392.18
Guanggang Geng414120.78