Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition
-
Graphical Abstract
-
Abstract
Long short-term memory RNNs (LSTMRNNs) have shown great success in the Automatic speech recognition (ASR) field and have become the state-ofthe-art acoustic model for time-sequence modeling tasks. However, it is still difficult to train deep LSTM-RNNs while keeping the parameter number small. We use the highway connections between memory cells in adjacent layers to train a small-footprint highway LSTM-RNNs (HLSTM-RNNs), which are deeper and thinner compared to conventional LSTM-RNNs. The experiments on the Switchboard (SWBD) indicate that we can train thinner and deeper HLSTM-RNNs with a smaller parameter number than the conventional 3-layer LSTM-RNNs and a lower Word error rate (WER) than the conventional one. Compared with the counterparts of small-footprint LSTMRNNs, the small-footprint HLSTM-RNNs show greater reduction in WER.
-
-