Teacher-Student Training Approach Using an Adaptive Gain Mask for LSTM-Based Speech Enhancement in the Airborne Noise Environment
-
Graphical Abstract
-
Abstract
Research on speech enhancement algorithms in the airborne environment is of great significance to the security of airborne systems. Recently, the research focus of speech enhancement has turned from conventional unsupervised algorithms, like the log minimum mean square error estimator (log-MMSE), to the state-of-the-art masking-based long short-term memory (LSTM) method. However, each method has its characteristics and limitations, so they cannot always handle noise well. Besides, the requirements of clean speech and noise data for training a supervised speech enhancement model are difficult to satisfy in the real-world airborne environment. Therefore, in this work, to fully utilize the advantages of those two different methods without any data restrictions, we propose a novel adaptive gain mask (AGM) based teacher-student training approach for speech enhancement. In our method, the AGM, as a robust learning target for the student model, is devised by incorporating the estimated ideal ratio mask from the teacher model into the procedure of the log-MMSE approach. To get an appropriate tradeoff between the two methods, we adaptively update the AGM using a recursive weighting coefficient. Experiments on the real airborne data show that the proposed AGM-based method outperforms other baselines in terms of all essential objective metrics evaluated in this paper.
-
-