Title
Phoneme-guided Dysarthric speech conversion With non-parallel data by joint training
Abstract
The phonetic structures of dysarthric speech are more difficult to discriminate than those of normal speech. Therefore, in this paper, we propose a novel voice conversion framework for dysarthric speech by learning disentangled audio-transcription representations. The novelty of this method is that it simultaneously takes both audio and its corresponding transcription as training inputs. We constrain the extracted linguistic representation from the audio input to be close to the linguistic representation from the transcription input, forcing them to share the same distribution. Furthermore, the proposed model can generate appropriate linguistic representations without any transcripts during the testing stage. The results of objective and subjective evaluations showed that the proposed method exhibits higher intelligibility and better speaker similarity of the converted speech than those of the baseline approaches.
Year
DOI
Venue
2022
10.1007/s11760-021-02119-6
Signal, Image and Video Processing
Keywords
DocType
Volume
Voice conversion, Dysarthric speech, Autoencoder, Non-parallel data
Journal
16
Issue
ISSN
Citations 
6
1863-1703
0
PageRank 
References 
Authors
0.34
10
5
Name
Order
Citations
PageRank
Chen, Xunquan100.34
Oshiro, Atsuki200.34
J. Chen311223.18
Takashima, Ryoichi400.34
Tetsuya Takiguchi5858.77