1 code implementation • 17 Jul 2024 • Angie Boggust, Hyemin Bang, Hendrik Strobelt, Arvind Satyanarayan
While interpretability methods identify a model's learned concepts, they overlook the relationships between concepts that make up its abstractions and inform its ability to generalize to new data.
1 code implementation • 4 Apr 2024 • Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne
Vision Transformers (ViTs), with their ability to model long-range dependencies through self-attention mechanisms, have become a standard architecture in computer vision.
1 code implementation • 18 Sep 2023 • Zoe De Simone, Angie Boggust, Arvind Satyanarayan, Ashia Wilson
Generative text-to-image (TTI) models produce high-quality images from short textual descriptions and are widely used in academic and creative domains.
1 code implementation • 28 Jun 2023 • Benny J. Tang, Angie Boggust, Arvind Satyanarayan
Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities.
1 code implementation • 7 Jun 2022 • Angie Boggust, Harini Suresh, Hendrik Strobelt, John V. Guttag, Arvind Satyanarayan
Moreover, with saliency cards, we are able to analyze the research landscape in a more structured fashion to identify opportunities for new methods and evaluation metrics for unmet user needs.
1 code implementation • 8 Nov 2021 • Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass
In this paper, we explore self-supervised audio-visual models that learn from instructional videos.
1 code implementation • 20 Jul 2021 • Angie Boggust, Benjamin Hoover, Arvind Satyanarayan, Hendrik Strobelt
Saliency methods -- techniques to identify the importance of input features on a model's output -- are a common step in understanding neural network behavior.
1 code implementation • ICCV 2021 • Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang
Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.
1 code implementation • 16 Jun 2020 • Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass
Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 10 Dec 2019 • Angie Boggust, Brandon Carter, Arvind Satyanarayan
Embeddings mapping high-dimensional discrete input to lower-dimensional continuous vector spaces have been widely adopted in machine learning applications as a way to capture domain semantics.