Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition

CHENG Gaofeng; LI Xin; YAN Yonghong

doi:10.1049/cje.2018.11.008

CHENG Gaofeng, LI Xin, YAN Yonghong. Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition[J]. Chinese Journal of Electronics, 2019, 28(1): 107-112. DOI: 10.1049/cje.2018.11.008

Citation:

Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Long short-term memory RNNs (LSTMRNNs) have shown great success in the Automatic speech recognition (ASR) field and have become the state-ofthe-art acoustic model for time-sequence modeling tasks. However, it is still difficult to train deep LSTM-RNNs while keeping the parameter number small. We use the highway connections between memory cells in adjacent layers to train a small-footprint highway LSTM-RNNs (HLSTM-RNNs), which are deeper and thinner compared to conventional LSTM-RNNs. The experiments on the Switchboard (SWBD) indicate that we can train thinner and deeper HLSTM-RNNs with a smaller parameter number than the conventional 3-layer LSTM-RNNs and a lower Word error rate (WER) than the conventional one. Compared with the counterparts of small-footprint LSTMRNNs, the small-footprint HLSTM-RNNs show greater reduction in WER.