Resource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion

13 Mar 2018Matthijs Van keirsbilckBert MoonsMarian Verhelst

Today's Automatic Speech Recognition systems only rely on acoustic signals and often don't perform well under noisy conditions. Performing multi-modal speech recognition - processing acoustic speech signals and lip-reading video simultaneously - significantly enhances the performance of such systems, especially in noisy environments... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper