no code implementations • EMNLP (IWSLT) 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze
In Neural Machine Translation (NMT) the usage of sub-words and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.
no code implementations • 23 Feb 2024 • Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu
Prior work on selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain.
no code implementations • 21 Feb 2024 • Woojeong Jin, Tejas Srinivasan, Jesse Thomason, Xiang Ren
We present WinoViz, a text-only evaluation dataset, consisting of 1, 380 examples that probe the reasoning abilities of language models regarding variant visual properties of objects under different contexts or states.
1 code implementation • 30 Sep 2023 • Lee Kezar, Riley Carlin, Tejas Srinivasan, Zed Sehyr, Naomi Caselli, Jesse Thomason
Specifically, we explore how learning strategies like multi-task and curriculum learning can leverage mutually useful information between phoneme types to facilitate better modeling of sign language phonemes.
1 code implementation • 4 Apr 2023 • Tejas Srinivasan, Furong Jia, Mohammad Rostami, Jesse Thomason
We propose Improvise to Initialize (I2I), a continual learning algorithm that initializes Adapters for incoming tasks by distilling knowledge from previously-learned tasks' Adapters.
1 code implementation • 27 Feb 2023 • Allen Chang, Xiaoyuan Zhu, Aarav Monga, Seoho Ahn, Tejas Srinivasan, Jesse Thomason
Benchmarks for language-guided embodied agents typically assume text-based instructions, but deployed agents will encounter spoken instructions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 18 Aug 2022 • Georgios Chochlakis, Tejas Srinivasan, Jesse Thomason, Shrikanth Narayanan
VAuLT is an extension of the popular Vision-and-Language Transformer (ViLT), and improves performance on vision-and-language (VL) tasks that involve more complex text inputs than image captions while having minimal impact on training and inference efficiency.
no code implementations • 29 Jul 2022 • Tejas Srinivasan, Xiang Ren, Jesse Thomason
Aligning image and text encoders from scratch using contrastive learning requires large amounts of paired image-text data.
1 code implementation • 18 Jun 2022 • Tejas Srinivasan, Ting-Yun Chang, Leticia Leonor Pinto Alva, Georgios Chochlakis, Mohammad Rostami, Jesse Thomason
Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks.
no code implementations • NAACL (GeBNLP) 2022 • Tejas Srinivasan, Yonatan Bisk
Numerous works have analyzed biases in vision and pre-trained language models individually - however, less attention has been paid to how these biases interact in multimodal settings.
no code implementations • EMNLP (nlpbt) 2020 • Muhammad A. Shah, Shikib Mehri, Tejas Srinivasan
While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge.
no code implementations • EMNLP (nlpbt) 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott
Our experiments on the Flickr 8K Audio Captions Corpus show that multimodal ASR can generalize to recover different types of masked words in this unstructured masking setting.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott
In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 13 Feb 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze
Speech is understood better by using visual context; for this reason, there have been many attempts to use images to adapt automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • EMNLP (IWSLT) 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze
In Neural Machine Translation (NMT) the usage of subwords and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.
1 code implementation • WS 2019 • Shikib Mehri, Tejas Srinivasan, Maxine Eskenazi
Neural dialog models have exhibited strong performance, however their end-to-end nature lacks a representation of the explicit structure of dialog.
no code implementations • 30 Jun 2019 • Tejas Srinivasan, Ramon Sanabria, Florian Metze
Multimodal learning allows us to leverage information from multiple sources (visual, acoustic and text), similar to our experience of the real world.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1