Search Results for author: Vedanuj Goswami

Found 17 papers, 10 papers with code

Multilingual Speech-to-Speech Translation into Multiple Target Languages

no code implementations17 Jul 2023 Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino

Despite a few studies on multilingual S2ST, their focus is the multilinguality on the source side, i. e., the translation from multiple source languages to one target language.

Language Identification Speech-to-Speech Translation +1

Revisiting Machine Translation for Cross-lingual Classification

no code implementations23 May 2023 Mikel Artetxe, Vedanuj Goswami, Shruti Bhosale, Angela Fan, Luke Zettlemoyer

Machine Translation (MT) has been widely used for cross-lingual classification, either by translating the test set into English and running inference with a monolingual model (translate-test), or translating the training set into the target languages and finetuning a multilingual model (translate-train).

Classification Cross-Lingual Transfer +2

Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

1 code implementation3 May 2023 Haoran Xu, Maha Elbayad, Kenton Murray, Jean Maillard, Vedanuj Goswami

Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token.

Machine Translation Translation

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

1 code implementation1 Mar 2023 Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang

We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages.

Audio-Visual Speech Recognition Robust Speech Recognition +4

Language-Aware Multilingual Machine Translation with Self-Supervised Learning

1 code implementation10 Feb 2023 Haoran Xu, Jean Maillard, Vedanuj Goswami

In this work, we first investigate how to utilize intra-distillation to learn more *language-specific* parameters and then show the importance of these language-specific parameters.

Cross-Lingual Transfer Denoising +3

Causes and Cures for Interference in Multilingual Translation

no code implementations14 Dec 2022 Uri Shaham, Maha Elbayad, Vedanuj Goswami, Omer Levy, Shruti Bhosale

Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference.

Machine Translation Translation

FLAVA: A Foundational Language And Vision Alignment Model

3 code implementations CVPR 2022 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, Douwe Kiela

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks.

Image Retrieval Image-to-Text Retrieval +3

Tricks for Training Sparse Translation Models

no code implementations NAACL 2022 Dheeru Dua, Shruti Bhosale, Vedanuj Goswami, James Cross, Mike Lewis, Angela Fan

Multi-task learning with an unbalanced data distribution skews model learning towards high resource tasks, especially when model capacity is fixed and fully shared across all tasks.

Machine Translation Multi-Task Learning +1

Human-Adversarial Visual Question Answering

no code implementations NeurIPS 2021 Sasha Sheng, Amanpreet Singh, Vedanuj Goswami, Jose Alberto Lopez Magana, Wojciech Galuba, Devi Parikh, Douwe Kiela

Human subjects interact with a state-of-the-art VQA model, and for each image in the dataset, attempt to find a question where the model's predicted answer is incorrect.

Question Answering Visual Question Answering

MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond

1 code implementation ICLR 2021 Duy-Kien Nguyen, Vedanuj Goswami, Xinlei Chen

This paper focuses on visual counting, which aims to predict the number of occurrences given a natural image and a query (e. g. a question or a category).

Object Counting Question Answering +1

Are we pretraining it right? Digging deeper into visio-linguistic pretraining

no code implementations19 Apr 2020 Amanpreet Singh, Vedanuj Goswami, Devi Parikh

Surprisingly, we show that automatically generated data in a domain closer to the downstream task (e. g., VQA v2) is a better choice for pretraining than "natural" data but of a slightly different domain (e. g., Conceptual Captions).

Visual Question Answering (VQA)

12-in-1: Multi-Task Vision and Language Representation Learning

5 code implementations CVPR 2020 Jiasen Lu, Vedanuj Goswami, Marcus Rohrbach, Devi Parikh, Stefan Lee

Much of vision-and-language research focuses on a small but diverse set of independent tasks and supporting datasets often studied in isolation; however, the visually-grounded language understanding skills required for success at these tasks overlap significantly.

Image Retrieval Question Answering +3

Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

no code implementations19 Jul 2019 Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

However, in current video datasets it has been observed that action classes can often be recognized without any temporal information from a single frame of video.

Benchmarking Motion Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.