Title
Feature extraction for the author name disambiguation problem in a bibliographic database.
Abstract
Author name disambiguation in bibliographic databases has been, and still is, a challenging research task due to the high uncertainty there is when matching a publication author with a concrete researcher. Common approaches normally either resort to clustering to group author's publications, or use a binary classifier to decide whether a given publication is written by a specific author. Both approaches benefit from authors publishing similar works (e.g. subject areas and venues), from the previous publication history of an author (the higher, the better), and validated publication-author associations for model creation. However, whenever such an algorithm is confronted with different works from an author, or an author without publication history, often it makes wrong identifications. In this paper, we describe a feature extraction method that aims to avoid the previous problems. Instead of generally characterizing an author, it selectively uses features that associate the author to a certain publication. We build a Random Forest model to assess the quality of our set of features. Its goal is to predict whether a given author is the true author of a certain publication. We use a bibliographic database named Authenticus with more than 250,000 validated author-publication associations to test model quality. Our model achieved a top result of 95.37+ accuracy in predicting matches and 91.92+ in a real test scenario. Furthermore, in the last case the model was able to correctly predict 61.86+ of the cases where authors had no previous publication history.
Year
DOI
Venue
2017
10.1145/3019612.3019663
SAC
Field
DocType
Citations 
Bibliographic database,Information retrieval,Binary classification,Computer science,Supervised learning,Feature extraction,Scenario testing,Publishing,Cluster analysis,Random forest
Conference
1
PageRank 
References 
Authors
0.36
14
2
Name
Order
Citations
PageRank
j c silva162.15
Fernando Silva236538.27