no code implementations • 13 Jun 2024 • Shruti Palaskar, Oggi Rudovic, Sameer Dharur, Florian Pesce, Gautam Krishna, Aswin Sivaraman, Jack Berkowitz, Ahmed Hussen Abdelaziz, Saurabh Adya, Ahmed Tewfik
Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data.
no code implementations • 24 May 2022 • Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasović
Combining the visual modality with pretrained language models has been surprisingly effective for simple descriptive tasks such as image captioning.
no code implementations • 12 Oct 2021 • Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze
End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences.
1 code implementation • CVPR 2021 • Amanda Duarte, Shruti Palaskar, Lucas Ventura, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, Xavier Giro-i-Nieto
Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth.
no code implementations • WS 2020 • Anirudh Mani, Shruti Palaskar, S Konam, eep
Domain Adaptation for Automatic Speech Recognition (ASR) error correction via machine translation is a useful technique for improving out-of-domain outputs of pre-trained ASR systems to obtain optimal results for specific in-domain tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 13 Mar 2020 • Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze
We propose a simple technique to perform domain adaptation for ASR error correction via machine translation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
no code implementations • ACL 2019 • Shruti Palaskar, Jindrich Libovický, Spandana Gella, Florian Metze
In this paper, we study abstractive summarization for open-domain videos.
Ranked #1 on
Text Summarization
on How2
no code implementations • 18 Feb 2019 • Shruti Palaskar, Vikas Raunak, Florian Metze
End-to-end acoustic-to-word speech recognition models have recently gained popularity because they are easy to train, scale well to large amounts of training data, and do not require a lexicon.
no code implementations • 21 Nov 2018 • Nils Holzenberger, Shruti Palaskar, Pranava Madhyastha, Florian Metze, Raman Arora
This shows it is possible to learn reliable representations across disparate, unaligned and noisy modalities, and encourages using the proposed approach on larger datasets.
1 code implementation • 9 Nov 2018 • Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze
Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
2 code implementations • 1 Nov 2018 • Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze
In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 23 Jul 2018 • Shruti Palaskar, Florian Metze
We present effective methods to train Sequence-to-Sequence models for direct word-level recognition (and character-level recognition) and show an absolute improvement of 4. 4-5. 0\% in Word Error Rate on the Switchboard corpus compared to prior work.
no code implementations • 25 Apr 2018 • Shruti Palaskar, Ramon Sanabria, Florian Metze
Transcription or sub-titling of open-domain videos is still a challenging domain for Automatic Speech Recognition (ASR) due to the data's challenging acoustics, variable signal processing and the essentially unrestricted domain of the data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 14 Feb 2018 • Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography.
no code implementations • 8 Sep 2017 • Yohan Jo, Lisa Lee, Shruti Palaskar
There is a great need for technologies that can predict the mortality of patients in intensive care units with both high accuracy and accountability.