From Audio to Semantics: Approaches to end-to-end spoken language understanding

24 Sep 2018Parisa HaghaniArun NarayananMichiel BacchianiGalen ChuangNeeraj GaurPedro MorenoRohit PrabhavalkarZhongdi QuAustin Waters

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules are typically optimized independently... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet