Multimodal and Multiresolution Speech Recognition with Transformers

ACL 2020 Georgios ParaskevopoulosSrinivas ParthasarathyAparna KhareShiva Sundaram

This paper presents an audio visual automatic speech recognition (AV-ASR) system using a Transformer-based architecture. We particularly focus on the scene context provided by the visual information, to ground the ASR... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.