University of LiègeULgFaculty of EngineeringFacSALibrary News   
Seminar : Active and Semi-Supervised Learning in Automatic Speech Recognition

Abstract:

This presentation focuses on Automatic Speech Recognition (ASR), as used in various Amazon products such as Alexa (Amazon Echo) and FireTV. For such applications, a lot of data is available but only a small portion of them can be labeled. Because speech data labeling is a time-consuming and hence a costly process, it is crucial to find an optimal strategy to select the data to be transcribed via Active Learning (AL). In addition, the unselected data might also be helpful in improving the performance of the ASR system by Semi-Supervised Training (SST). After an overview of the ASR technology, we will investigate the benefits of jointly applying AL and SST. Our data selection approach relies on confidence filtering, and its impact on the two main ASR modules (acoustic and language models) will be studied. Our results indicate that, while SST is crucial at the beginning of the labeling process, its gains degrade rapidly as AL is set in place. The final simulation reports that AL allows a transcription cost reduction of about 70% over random selection. Alternatively, for a fixed transcription budget, the proposed approach improves the word error rate by about 12.5% relative.