Abstract | ||
---|---|---|
Tibetan word segmentation is one of the key technologies to realize Tibetan speech synthesis and Tibetan speech recognition. Traditional Tibetan word segmentations mainly relied on the combination of rules and statistics. The model automatic learning features become possible in the era of deep learning. This paper proposes a Tibetan word segmentation method based on bidirectional long-term memory neural network with conditional random field model (BiLSTM_ CRF). The Tibetan sentence is firstly divided into clauses, words and abbreviated words manually. Low-frequency words are removed to form a Tibetan dictionary. The text features are then extracted with the dictionary by embedding words into the corpus using Word2vec to get word vectors. The word vector features are transmited to the BiLSTM model. The learned result from BiLSTM model is finally transmitted as features to the CRF model for four-word labeling to obtain the Tibetan word segmentation results. The experimental results show that the proposed Tibetan word segmentation method can achieve better word segmentation effect. The accuracy of word segmentation can reach 94.33%, the recall rate is 93.89% and the F value is 94.11%. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/IALP.2018.8629257 | 2018 International Conference on Asian Language Processing (IALP) |
Keywords | Field | DocType |
Word2vec,BiLSTM_CRF model,Tibetan word segmentation | Conditional random field,Speech synthesis,Computer science,Text segmentation,Context model,Artificial intelligence,Natural language processing,Deep learning,Word2vec,Hidden Markov model,Sentence | Conference |
ISSN | ISBN | Citations |
2159-1962 | 978-1-5386-8298-2 | 0 |
PageRank | References | Authors |
0.34 | 0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lili Wang | 1 | 172 | 45.30 |
Yang Hongwu | 2 | 16 | 14.32 |