Title
Compilation of a Spanish Representative Corpus
Abstract
Due to the Zipf law, even a very large corpus contains very few occurrences (tokens) for the majority of its different words (types). Only a corpus containing enough occurrences of even rare words can provide necessary statistical information for the study of contextual usage of words. We call such corpus representative and suggest to use Internet for its compilation. The corresponding algorithm and its application to Spanish are described. Different concepts of a representative corpus are discussed.
Year
Venue
Keywords
2002
CICLing
different concept,different word,spanish representative corpus,necessary statistical information,corpus representative,representative corpus,enough occurrence,zipf law,large corpus,contextual usage,corresponding algorithm
Field
DocType
ISBN
Zipf's law,Computer science,Lexicon,Artificial intelligence,Corpus linguistics,Natural language processing,Lexico,The Internet
Conference
3-540-43219-1
Citations 
PageRank 
References 
7
0.90
1
Authors
3
Name
Order
Citations
PageRank
Alexander Gelbukh12843269.19
Grigori Sidorov239860.51
Liliana Chanona-Hernández311210.00