Title | ||
---|---|---|
Incorporating Word Embeddings into Open Directory Project based Large-scale Classification. |
Abstract | ||
---|---|---|
Recently, implicit representation models, such as embedding or deep learning, have been successfully adopted to text classification task due to their outstanding performance. However, these approaches are limited to small-or moderate-scale text classification. Explicit representation models are often used in a large-scale text classification, like the Open Directory Project (ODP)-based text classification. However, the performance of these models is limited to the associated knowledge bases. In this paper, we incorporate word embeddings into the ODP-based large-scale classification. To this end, we first generate category vectors, which represent the semantics of ODP categories by jointly modeling word embeddings and the ODP-based text classification. We then propose a novel semantic similarity measure, which utilizes the category and word vectors obtained from the joint model. The evaluation results clearly show the efficacy of our methodology in large-scale text classification. The proposed scheme exhibits significant improvements of 10% and 28% in terms of macro-averaging F1-score and precision at k, respectively, over state-of-the-art techniques. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/978-3-319-93037-4_30 | ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II |
Keywords | DocType | Volume |
Text classification,Word embeddings | Conference | 10938 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
14 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kangmin Kim | 1 | 13 | 8.69 |
Aliyeva Dinara | 2 | 0 | 0.34 |
Byung-Ju Choi | 3 | 0 | 1.69 |
Sangkeun Lee | 4 | 14 | 5.09 |