2 code implementations • 9 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo
Webpages have been a rich resource for language and vision-language tasks.
1 code implementation • 5 May 2023 • Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo
Webpages have been a rich, scalable resource for vision-language and language only tasks.
no code implementations • CVPR 2023 • Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko
We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data.
1 code implementation • 4 Feb 2022 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer
To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app.
1 code implementation • 14 Aug 2021 • Andrea Burns, Aaron Sarna, Dilip Krishnan, Aaron Maschinot
Disentangled visual representations have largely been studied with generative models such as Variational AutoEncoders (VAEs).
1 code implementation • 17 Apr 2021 • Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer
In recent years, vision-language research has shifted to study tasks which require more complex reasoning, such as interactive question answering, visual common sense reasoning, and question-answer plausibility prediction.
no code implementations • ECCV 2020 • Andrea Burns, Donghyun Kim, Derry Wijaya, Kate Saenko, Bryan A. Plummer
Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added.
no code implementations • ICCV 2019 • Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer
Shouldn't language and vision features be treated equally in vision-language (VL) tasks?