no code implementations • 29 May 2025 • Declan Kutscher, David M. Chan, Yutong Bai, Trevor Darrell, Ritwik Gupta
REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3. 01% and Functional Map of the World by 13. 35%.
1 code implementation • 29 May 2025 • HeeKyung Lee, Jiaxin Ge, Tsung-Han Wu, Minwoo Kang, Trevor Darrell, David M. Chan
Rebus puzzles, visual riddles that encode language through imagery, spatial arrangement, and symbolic substitution, pose a unique challenge to current vision-language models (VLMs).
no code implementations • 5 May 2025 • Jerome Quenum, Wen-Han Hsieh, Tsung-Han Wu, Ritwik Gupta, Trevor Darrell, David M. Chan
Segmentation models can recognize a pre-defined set of objects in images.
1 code implementation • 17 Apr 2025 • Tsung-Han Wu, HeeKyung Lee, Jiaxin Ge, Joseph E. Gonzalez, Trevor Darrell, David M. Chan
Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations, where they generate descriptions of nonexistent objects, actions, or concepts, posing significant risks in safety-critical applications.
no code implementations • 16 Apr 2025 • Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, David M. Chan
While previous studies have examined whether models can reflect individual opinions or attitudes, we argue that a \emph{higher-order} binding of virtual personas requires successfully approximating not only the opinions of a user as an identified member of a group, but also the nuanced ways in which that user perceives and evaluates those outside the group.
no code implementations • 19 Mar 2025 • Zineng Tang, Long Lian, Seun Eisape, Xudong Wang, Roei Herzig, Adam Yala, Alane Suhr, Trevor Darrell, David M. Chan
These models, by performing language alignment, tend to prioritize high-level semantics over visual understanding, weakening their image understanding.
no code implementations • 7 Nov 2024 • David M. Chan, Rodolfo Corona, Joonyong Park, Cheol Jun Cho, Yutong Bai, Trevor Darrell
Through these experiments, we demonstrate how understanding the statistical properties of discrete visual languages can inform the design of more effective computer vision models.
1 code implementation • 19 Sep 2024 • Tsung-Han Wu, Joseph E. Gonzalez, Trevor Darrell, David M. Chan
The Automated Audio Captioning (AAC) task asks models to generate natural language descriptions of an audio input.
no code implementations • 16 Sep 2024 • Joseph Suh, Suhong Moon, Minwoo Kang, David M. Chan
Assessing personality traits using large language models (LLMs) has emerged as an interesting and challenging area of research.
no code implementations • 16 Sep 2024 • Hitesh Tulsiani, David M. Chan, Shalini Ghosh, Garima Lalwani, Prabhat Pandey, Ankish Bansal, Sri Garimella, Ariya Rastrow, Björn Hoffmeister
Dialog systems, such as voice assistants, are expected to engage with users in complex, evolving conversations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 18 Jul 2024 • Tsung-Han Wu, Giscard Biamby, Jerome Quenum, Ritwik Gupta, Joseph E. Gonzalez, Trevor Darrell, David M. Chan
MIRAGE demonstrates up to 13% performance improvement over existing open-source LMMs on VHs, sets a new state-of-the-art on the RetVQA multi-image QA benchmark, and achieves competitive performance on single-image QA with state-of-the-art LMMs.
1 code implementation • 9 Jul 2024 • Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan
Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits.
no code implementations • 3 Apr 2024 • Suzanne Petryk, David M. Chan, Anish Kachinthaya, Haodi Zou, John Canny, Joseph E. Gonzalez, Trevor Darrell
Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene.
1 code implementation • 10 Jan 2024 • Kevin Cai, Chonghua Liu, David M. Chan
The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18. 8% are English speakers, and just 5. 1% consider it their native language, leading to disparities in online information access.
1 code implementation • 4 Jan 2024 • David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister
We demonstrate that our CLC family of approaches can improve the performance of ASR models on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19. 2%.
no code implementations • 22 Dec 2023 • Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu
In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 2 Feb 2023 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny
If you ask a human to describe an image, they might do so in a thousand different ways.
no code implementations • 6 Jan 2023 • David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister
Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning).
1 code implementation • 16 Jun 2022 • Eliza Kosoy, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, Alison Gopnik
Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence.
no code implementations • 19 May 2022 • David M. Chan, Shalini Ghosh
Deep neural networks have largely demonstrated their ability to perform automated speech recognition (ASR) by extracting meaningful features from input audio frames.
1 code implementation • 12 May 2022 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny
While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.
no code implementations • 12 Oct 2021 • David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister
Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance.
1 code implementation • 2 Apr 2021 • Aatif Jiwani, Shubhrakanti Ganguly, Chao Ding, Nan Zhou, David M. Chan
Urban areas consume over two-thirds of the world's energy and account for more than 70 percent of global CO2 emissions.
no code implementations • 27 Jul 2020 • David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny
Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.
1 code implementation • 6 May 2020 • Eliza Kosoy, Jasmine Collins, David M. Chan, Sandy Huang, Deepak Pathak, Pulkit Agrawal, John Canny, Alison Gopnik, Jessica B. Hamrick
Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn.
1 code implementation • 11 Dec 2018 • Biye Jiang, David M. Chan, Tianhao Zhang, John F. Canny
Finally we show that diagnostic visualization using LDAM leads to a novel insight into the parameter averaging method for deep net training.
1 code implementation • 31 Jul 2018 • David M. Chan, Roshan Rao, Forrest Huang, John F. Canny
Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples.