Title
Vocabulary-Based Language Similarity using Web Corpora.
Abstract
This paper will focus on automatic methods for quantifying language similarity. This is achieved by ascribing language similarity to the similarity of text corpora. This corpus similarity will first be determined by the resemblance of the vocabulary of languages. Thereto words or parts of them such as letter n-grams are examined. Extensions like transliteration of the text data will ensure the independence of the methods from text characteristics such as the writing system used. Further analyzes will show to what extent knowledge about the distribution of words in parallel text can be used in the context of language similarity.
Year
Venue
Keywords
2014
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
language similarity,corpora,vocabulary
Field
DocType
Citations 
Computer science,Writing system,Text corpus,Artificial intelligence,Natural language processing,Vocabulary,Transliteration
Conference
0
PageRank 
References 
Authors
0.34
3
2
Name
Order
Citations
PageRank
Dirk Goldhahn1115.22
Uwe Quasthoff219526.62