Search Results for author: Shruti Palaskar

Found 14 papers, 3 papers with code

On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization

no code implementations24 May 2022 Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasović

Combining the visual modality with pretrained language models has been surprisingly effective for simple descriptive tasks such as image captioning.

Descriptive Image Captioning +5

Speech Summarization using Restricted Self-Attention

no code implementations12 Oct 2021 Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze

End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences.

Document Summarization speech-recognition +2

How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

1 code implementation CVPR 2021 Amanda Duarte, Shruti Palaskar, Lucas Ventura, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, Xavier Giro-i-Nieto

Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth.

Sign Language Production Sign Language Translation +1

Towards Understanding ASR Error Correction for Medical Conversations

no code implementations WS 2020 Anirudh Mani, Shruti Palaskar, S Konam, eep

Domain Adaptation for Automatic Speech Recognition (ASR) error correction via machine translation is a useful technique for improving out-of-domain outputs of pre-trained ASR systems to obtain optimal results for specific in-domain tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Learned In Speech Recognition: Contextual Acoustic Word Embeddings

no code implementations18 Feb 2019 Shruti Palaskar, Vikas Raunak, Florian Metze

End-to-end acoustic-to-word speech recognition models have recently gained popularity because they are easy to train, scale well to large amounts of training data, and do not require a lexicon.

speech-recognition Speech Recognition +2

Learning from Multiview Correlations in Open-Domain Videos

no code implementations21 Nov 2018 Nils Holzenberger, Shruti Palaskar, Pranava Madhyastha, Florian Metze, Raman Arora

This shows it is possible to learn reliable representations across disparate, unaligned and noisy modalities, and encourages using the proposed approach on larger datasets.

Representation Learning Retrieval

Multimodal Grounding for Sequence-to-Sequence Speech Recognition

1 code implementation9 Nov 2018 Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze

Specifically, in our previous work, we propose a multistep visual adaptive training approach which improves the accuracy of an audio-based Automatic Speech Recognition (ASR) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

How2: A Large-scale Dataset for Multimodal Language Understanding

2 code implementations1 Nov 2018 Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze

In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Acoustic-to-Word Recognition with Sequence-to-Sequence Models

no code implementations23 Jul 2018 Shruti Palaskar, Florian Metze

We present effective methods to train Sequence-to-Sequence models for direct word-level recognition (and character-level recognition) and show an absolute improvement of 4. 4-5. 0\% in Word Error Rate on the Switchboard corpus compared to prior work.

Language Modelling speech-recognition +1

End-to-End Multimodal Speech Recognition

no code implementations25 Apr 2018 Shruti Palaskar, Ramon Sanabria, Florian Metze

Transcription or sub-titling of open-domain videos is still a challenging domain for Automatic Speech Recognition (ASR) due to the data's challenging acoustics, variable signal processing and the essentially unrestricted domain of the data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Combining LSTM and Latent Topic Modeling for Mortality Prediction

no code implementations8 Sep 2017 Yohan Jo, Lisa Lee, Shruti Palaskar

There is a great need for technologies that can predict the mortality of patients in intensive care units with both high accuracy and accountability.

Mortality Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.