no code implementations • 17 Aug 2020 • Viet-Nhat Nguyen, Mostafa Sadeghi, Elisa Ricci, Xavier Alameda-Pineda
To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data.
no code implementations • 23 Oct 2019 • Mattia Antonino Di Gangi, Viet-Nhat Nguyen, Matteo Negri, Marco Turchi
Despite recent technology advancements, the effectiveness of neural approaches to end-to-end speech-to-text translation is still limited by the paucity of publicly available training corpora.