Title
Tibetan Word Segmentation Method Based on BiLSTM_ CRF Model
Abstract
Tibetan word segmentation is one of the key technologies to realize Tibetan speech synthesis and Tibetan speech recognition. Traditional Tibetan word segmentations mainly relied on the combination of rules and statistics. The model automatic learning features become possible in the era of deep learning. This paper proposes a Tibetan word segmentation method based on bidirectional long-term memory neural network with conditional random field model (BiLSTM_ CRF). The Tibetan sentence is firstly divided into clauses, words and abbreviated words manually. Low-frequency words are removed to form a Tibetan dictionary. The text features are then extracted with the dictionary by embedding words into the corpus using Word2vec to get word vectors. The word vector features are transmited to the BiLSTM model. The learned result from BiLSTM model is finally transmitted as features to the CRF model for four-word labeling to obtain the Tibetan word segmentation results. The experimental results show that the proposed Tibetan word segmentation method can achieve better word segmentation effect. The accuracy of word segmentation can reach 94.33%, the recall rate is 93.89% and the F value is 94.11%.
Year
DOI
Venue
2018
10.1109/IALP.2018.8629257
2018 International Conference on Asian Language Processing (IALP)
Keywords
Field
DocType
Word2vec,BiLSTM_CRF model,Tibetan word segmentation
Conditional random field,Speech synthesis,Computer science,Text segmentation,Context model,Artificial intelligence,Natural language processing,Deep learning,Word2vec,Hidden Markov model,Sentence
Conference
ISSN
ISBN
Citations 
2159-1962
978-1-5386-8298-2
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Lili Wang117245.30
Yang Hongwu21614.32