Search Results for author: David M. Chan

Found 27 papers, 14 papers with code

REOrdering Patches Improves Vision Models

no code implementations29 May 2025 Declan Kutscher, David M. Chan, Yutong Bai, Trevor Darrell, Ritwik Gupta

REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3. 01% and Functional Map of the World by 13. 35%.

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint

1 code implementation29 May 2025 HeeKyung Lee, Jiaxin Ge, Tsung-Han Wu, Minwoo Kang, Trevor Darrell, David M. Chan

Rebus puzzles, visual riddles that encode language through imagery, spatial arrangement, and symbolic substitution, pose a unique challenge to current vision-language models (VLMs).

Image Captioning Question Answering

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

1 code implementation17 Apr 2025 Tsung-Han Wu, HeeKyung Lee, Jiaxin Ge, Joseph E. Gonzalez, Trevor Darrell, David M. Chan

Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations, where they generate descriptions of nonexistent objects, actions, or concepts, posing significant risks in safety-critical applications.

Hallucination

Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions

no code implementations16 Apr 2025 Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, David M. Chan

While previous studies have examined whether models can reflect individual opinions or attitudes, we argue that a \emph{higher-order} binding of virtual personas requires successfully approximating not only the opinions of a user as an identified member of a group, but also the nuanced ways in which that user perceives and evaluates those outside the group.

Language Modeling Language Modelling

TULIP: Towards Unified Language-Image Pretraining

no code implementations19 Mar 2025 Zineng Tang, Long Lian, Seun Eisape, Xudong Wang, Roei Herzig, Adam Yala, Alane Suhr, Trevor Darrell, David M. Chan

These models, by performing language alignment, tend to prioritize high-level semantics over visual understanding, weakening their image understanding.

Contrastive Learning Data Augmentation +2

Analyzing The Language of Visual Tokens

no code implementations7 Nov 2024 David M. Chan, Rodolfo Corona, Joonyong Park, Cheol Jun Cho, Yutong Bai, Trevor Darrell

Through these experiments, we demonstrate how understanding the statistical properties of discrete visual languages can inform the design of more effective computer vision models.

CLAIR-A: Leveraging Large Language Models to Judge Audio Captions

1 code implementation19 Sep 2024 Tsung-Han Wu, Joseph E. Gonzalez, Trevor Darrell, David M. Chan

The Automated Audio Captioning (AAC) task asks models to generate natural language descriptions of an audio input.

Audio captioning Language Modeling +2

Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors

no code implementations16 Sep 2024 Joseph Suh, Suhong Moon, Minwoo Kang, David M. Chan

Assessing personality traits using large language models (LLMs) has emerged as an interesting and challenging area of research.

Descriptive

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

1 code implementation18 Jul 2024 Tsung-Han Wu, Giscard Biamby, Jerome Quenum, Ritwik Gupta, Joseph E. Gonzalez, Trevor Darrell, David M. Chan

MIRAGE demonstrates up to 13% performance improvement over existing open-source LMMs on VHs, sets a new state-of-the-art on the RetVQA multi-image QA benchmark, and achieves competitive performance on single-image QA with state-of-the-art LMMs.

Image Retrieval Question Answering +4

Virtual Personas for Language Models via an Anthology of Backstories

1 code implementation9 Jul 2024 Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan

Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits.

Diversity

ALOHa: A New Measure for Hallucination in Captioning Models

no code implementations3 Apr 2024 Suzanne Petryk, David M. Chan, Anish Kachinthaya, Haodi Zou, John Canny, Joseph E. Gonzalez, Trevor Darrell

Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene.

Hallucination Object +3

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

1 code implementation10 Jan 2024 Kevin Cai, Chonghua Liu, David M. Chan

The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18. 8% are English speakers, and just 5. 1% consider it their native language, leading to disparities in online information access.

Video Summarization

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

1 code implementation4 Jan 2024 David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister

We demonstrate that our CLC family of approaches can improve the performance of ASR models on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19. 2%.

Attribute Automatic Speech Recognition +4

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

no code implementations22 Dec 2023 Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

IC3: Image Captioning by Committee Consensus

1 code implementation2 Feb 2023 David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

If you ask a human to describe an image, they might do so in a thousand different ways.

Image Captioning

Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

no code implementations6 Jan 2023 David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning).

Domain Adaptation speech-recognition +4

Towards Understanding How Machines Can Learn Causal Overhypotheses

1 code implementation16 Jun 2022 Eliza Kosoy, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, Alison Gopnik

Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence.

BIG-bench Machine Learning Causal Inference

Content-Context Factorized Representations for Automated Speech Recognition

no code implementations19 May 2022 David M. Chan, Shalini Ghosh

Deep neural networks have largely demonstrated their ability to perform automated speech recognition (ASR) by extracting meaningful features from input audio frames.

speech-recognition Speech Recognition

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

1 code implementation12 May 2022 David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

Diversity Video Description

Multi-Modal Pre-Training for Automated Speech Recognition

no code implementations12 Oct 2021 David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance.

Language Modeling Language Modelling +4

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

no code implementations27 Jul 2020 David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.

Active Learning Video Captioning +1

Exploring Exploration: Comparing Children with RL Agents in Unified Environments

1 code implementation6 May 2020 Eliza Kosoy, Jasmine Collins, David M. Chan, Sandy Huang, Deepak Pathak, Pulkit Agrawal, John Canny, Alison Gopnik, Jessica B. Hamrick

Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn.

Diagnostic Visualization for Deep Neural Networks Using Stochastic Gradient Langevin Dynamics

1 code implementation11 Dec 2018 Biye Jiang, David M. Chan, Tianhao Zhang, John F. Canny

Finally we show that diagnostic visualization using LDAM leads to a novel insight into the parameter averaging method for deep net training.

Diagnostic

t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data

1 code implementation31 Jul 2018 David M. Chan, Roshan Rao, Forrest Huang, John F. Canny

Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples.

Dimensionality Reduction

Cannot find the paper you are looking for? You can Submit a new open access paper.