Search Results for author: Tejas Srinivasan

Found 17 papers, 7 papers with code

CMU’s Machine Translation System for IWSLT 2019

no code implementations EMNLP (IWSLT) 2019 Tejas Srinivasan, Ramon Sanabria, Florian Metze

In Neural Machine Translation (NMT) the usage of sub-words and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.

Machine Translation NMT +1

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

no code implementations23 Feb 2024 Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu

Prior work on selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain.

WinoViz: Probing Visual Properties of Objects Under Different States

no code implementations21 Feb 2024 Woojeong Jin, Tejas Srinivasan, Jesse Thomason, Xiang Ren

We present WinoViz, a text-only evaluation dataset, consisting of 1, 380 examples that probe the reasoning abilities of language models regarding variant visual properties of objects under different contexts or states.

Language Modelling

Exploring Strategies for Modeling Sign Language Phonology

1 code implementation30 Sep 2023 Lee Kezar, Riley Carlin, Tejas Srinivasan, Zed Sehyr, Naomi Caselli, Jesse Thomason

Specifically, we explore how learning strategies like multi-task and curriculum learning can leverage mutually useful information between phoneme types to facilitate better modeling of sign language phonemes.

I2I: Initializing Adapters with Improvised Knowledge

1 code implementation4 Apr 2023 Tejas Srinivasan, Furong Jia, Mohammad Rostami, Jesse Thomason

We propose Improvise to Initialize (I2I), a continual learning algorithm that initializes Adapters for incoming tasks by distilling knowledge from previously-learned tasks' Adapters.

Continual Learning Question Answering +2

Multimodal Speech Recognition for Language-Guided Embodied Agents

1 code implementation27 Feb 2023 Allen Chang, Xiaoyuan Zhu, Aarav Monga, Seoho Ahn, Tejas Srinivasan, Jesse Thomason

Benchmarks for language-guided embodied agents typically assume text-based instructions, but deployed agents will encounter spoken instructions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

VAuLT: Augmenting the Vision-and-Language Transformer for Sentiment Classification on Social Media

1 code implementation18 Aug 2022 Georgios Chochlakis, Tejas Srinivasan, Jesse Thomason, Shrikanth Narayanan

VAuLT is an extension of the popular Vision-and-Language Transformer (ViLT), and improves performance on vision-and-language (VL) tasks that involve more complex text inputs than image captions while having minimal impact on training and inference efficiency.

Descriptive Image Captioning +4

Curriculum Learning for Data-Efficient Vision-Language Alignment

no code implementations29 Jul 2022 Tejas Srinivasan, Xiang Ren, Jesse Thomason

Aligning image and text encoders from scratch using contrastive learning requires large amounts of paired image-text data.

Contrastive Learning Image Retrieval +3

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

1 code implementation18 Jun 2022 Tejas Srinivasan, Ting-Yun Chang, Leticia Leonor Pinto Alva, Georgios Chochlakis, Mohammad Rostami, Jesse Thomason

Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks.

Continual Learning Transfer Learning

Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models

no code implementations NAACL (GeBNLP) 2022 Tejas Srinivasan, Yonatan Bisk

Numerous works have analyzed biases in vision and pre-trained language models individually - however, less attention has been paid to how these biases interact in multimodal settings.

Reasoning Over History: Context Aware Visual Dialog

no code implementations EMNLP (nlpbt) 2020 Muhammad A. Shah, Shikib Mehri, Tejas Srinivasan

While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge.

coreference-resolution Question Answering +2

Multimodal Speech Recognition with Unstructured Audio Masking

no code implementations EMNLP (nlpbt) 2020 Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

Our experiments on the Flickr 8K Audio Captions Corpus show that multimodal ASR can generalize to recover different types of masked words in this unstructured masking setting.

8k Automatic Speech Recognition +2

Fine-Grained Grounding for Multimodal Speech Recognition

1 code implementation Findings of the Association for Computational Linguistics 2020 Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Looking Enhances Listening: Recovering Missing Speech Using Images

no code implementations13 Feb 2020 Tejas Srinivasan, Ramon Sanabria, Florian Metze

Speech is understood better by using visual context; for this reason, there have been many attempts to use images to adapt automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multitask Learning For Different Subword Segmentations In Neural Machine Translation

no code implementations EMNLP (IWSLT) 2019 Tejas Srinivasan, Ramon Sanabria, Florian Metze

In Neural Machine Translation (NMT) the usage of subwords and characters as source and target units offers a simple and flexible solution for translation of rare and unseen words.

Machine Translation NMT +2

Structured Fusion Networks for Dialog

1 code implementation WS 2019 Shikib Mehri, Tejas Srinivasan, Maxine Eskenazi

Neural dialog models have exhibited strong performance, however their end-to-end nature lacks a representation of the explicit structure of dialog.

reinforcement-learning Reinforcement Learning (RL)

Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions

no code implementations30 Jun 2019 Tejas Srinivasan, Ramon Sanabria, Florian Metze

Multimodal learning allows us to leverage information from multiple sources (visual, acoustic and text), similar to our experience of the real world.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.