Abstract | ||
---|---|---|
Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks. However, the majority of DCNN-based studies on speech dereverberation that employ self-attention do not explicitly account for the inter-dependencies between real and imaginary features when computing attention. In this study, we propose a complex-valued T-F attention (TFA) module that models spectral and temporal dependencies by computing two-dimensional attention maps across time and frequency dimensions. We validate the effectiveness of our proposed complex-valued TFA module with the deep complex convolutional recurrent network (DCCRN) using the REVERB challenge corpus. Experimental findings indicate that integrating our complex-TFA module with DCCRN improves overall speech quality and performance of back-end speech applications, such as automatic speech recognition, compared to earlier approaches for self-attention. |
Year | DOI | Venue |
---|---|---|
2022 | 10.21437/INTERSPEECH.2022-11277 | Conference of the International Speech Communication Association (INTERSPEECH) |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vinay Kothapally | 1 | 0 | 0.68 |
John H. L. Hansen | 2 | 3215 | 365.75 |