Enhanced Speech Based Jointly Statistical Probability Distribution Function for Voice Activity Detection
-
Graphical Abstract
-
Abstract
Most of Voice activity detection (VAD) methods are based on statistical model. In these methods, the noise signal is always assumed to satisfy and characterized by Gaussian distribution, while the assumption of noise does not always hold in practice and which causes that these kinds of method fail to distinguish speech from noise at low Signal-noise-ratio (SNR) level in non-stationary noise condition. For going further to improve the robustness of VAD, a enhanced speech based method is proposed. In the proposed method, the Laplacian distribution is used to model the remained noise since we find that the remained noise in enhanced speech satisfy Laplacian distribution; in addition, Gaussian mixture model is used to characterize the Discrete Fourier transform (DFT) coefficients of reconstructed speech in enhanced speech. Experimental results show that the proposed method performs better than the baseline method, especially in low SNR and non-stationary noise conditions.
-
-