Abstract | ||
---|---|---|
While text-to-speech synthesis with non-autoregressive Transformers has achieved state-of-the-art quality for many languages, the methodology of Estonian text-to-speech synthe-sis has not been revised for neural methods. This paper evaluates the quality of Estonian text-to -speech with Transformer-based models using different language-specific data processing steps. Additionally, we conduct a human evaluation to show how well these models can learn the pat-terns of Estonian pronunciation, given different amounts of training data and varying degrees of phonetic information. Our error analysis shows that using a simple multi-speaker approach can significantly decrease the number of pronunciation errors, while some information can also be helpful to a smaller extent. |
Year | DOI | Venue |
---|---|---|
2022 | 10.22364/bjmc.2022.10.3.17 | BALTIC JOURNAL OF MODERN COMPUTING |
Keywords | DocType | Volume |
speech technology, text -to -speech synthesis, Estonian | Journal | 10 |
Issue | ISSN | Citations |
3 | 2255-8942 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Liisa Ratsep | 1 | 0 | 0.34 |
Rasmus Lellep | 2 | 0 | 0.34 |
Mark Fishel | 3 | 0 | 0.68 |