Abstract | ||
---|---|---|
Voice biometrics has been applied to enhance the security of spoken language proficiency tests and ensure valid test scores by detecting fraudulent activity. These methods can, however, be triggered by certain distortions, including background noise and adjacent test-takers, resulting in false positive alarms. In this paper, a two-layer bi-directional LSTM RNN model is employed to detect these distorted (unusable) responses and a sub-sampling method is applied to reduce the difficulties of model training caused by very long input sequence and imbalanced training data. The system is evaluated on a corpus that was collected from an assessment of English language proficiency around the world. Results show that our approach significantly outperforms two baselines: a Gaussian mixture model (GMM) classifying frame-level features and an AdaBoost classifier operating on i-vectors. Our system’s F-score in unusable response detection is 0.60 compared to 0.43 and 0.49 for the two baseline systems. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ISCSLP.2018.8706635 | 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) |
Keywords | Field | DocType |
Feature extraction,Task analysis,Training,Speech recognition,Neural networks,Data models,Training data | Data modeling,Background noise,Task analysis,Pattern recognition,Computer science,Speech recognition,Feature extraction,Speaker recognition,Artificial intelligence,Artificial neural network,Spoken language,Mixture model | Conference |
ISBN | Citations | PageRank |
978-1-5386-5627-3 | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhaoheng Ni | 1 | 5 | 1.82 |
Rutuja Ubale | 2 | 2 | 3.17 |
Qian Yao | 3 | 527 | 51.55 |
Michael I. Mandel | 4 | 569 | 44.30 |
Su-Youn Yoon | 5 | 107 | 12.48 |
Abhinav Misra | 6 | 0 | 0.34 |
David Suendermann-Oeft | 7 | 3 | 2.17 |