Search Results for author: David M. Chan

Found 15 papers, 9 papers with code

ALOHa: A New Measure for Hallucination in Captioning Models

no code implementations3 Apr 2024 Suzanne Petryk, David M. Chan, Anish Kachinthaya, Haodi Zou, John Canny, Joseph E. Gonzalez, Trevor Darrell

Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene.

Hallucination Object +2

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

1 code implementation10 Jan 2024 Kevin Cai, Chonghua Liu, David M. Chan

The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18. 8% are English speakers, and just 5. 1% consider it their native language, leading to disparities in online information access.

Video Summarization

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

1 code implementation4 Jan 2024 David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister

We demonstrate that our CLC family of approaches can improve the performance of ASR models on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19. 2%.

Attribute Automatic Speech Recognition +4

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

no code implementations22 Dec 2023 Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

IC3: Image Captioning by Committee Consensus

1 code implementation2 Feb 2023 David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

If you ask a human to describe an image, they might do so in a thousand different ways.

Image Captioning

Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

no code implementations6 Jan 2023 David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning).

Domain Adaptation speech-recognition +1

Towards Understanding How Machines Can Learn Causal Overhypotheses

1 code implementation16 Jun 2022 Eliza Kosoy, David M. Chan, Adrian Liu, Jasmine Collins, Bryanna Kaufmann, Sandy Han Huang, Jessica B. Hamrick, John Canny, Nan Rosemary Ke, Alison Gopnik

Recent work in machine learning and cognitive science has suggested that understanding causal information is essential to the development of intelligence.

BIG-bench Machine Learning Causal Inference

Content-Context Factorized Representations for Automated Speech Recognition

no code implementations19 May 2022 David M. Chan, Shalini Ghosh

Deep neural networks have largely demonstrated their ability to perform automated speech recognition (ASR) by extracting meaningful features from input audio frames.

speech-recognition Speech Recognition

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

1 code implementation12 May 2022 David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

Video Description

Multi-Modal Pre-Training for Automated Speech Recognition

no code implementations12 Oct 2021 David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance.

Language Modelling Masked Language Modeling +3

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

no code implementations27 Jul 2020 David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.

Active Learning Video Captioning +1

Exploring Exploration: Comparing Children with RL Agents in Unified Environments

1 code implementation6 May 2020 Eliza Kosoy, Jasmine Collins, David M. Chan, Sandy Huang, Deepak Pathak, Pulkit Agrawal, John Canny, Alison Gopnik, Jessica B. Hamrick

Research in developmental psychology consistently shows that children explore the world thoroughly and efficiently and that this exploration allows them to learn.

Diagnostic Visualization for Deep Neural Networks Using Stochastic Gradient Langevin Dynamics

1 code implementation11 Dec 2018 Biye Jiang, David M. Chan, Tianhao Zhang, John F. Canny

Finally we show that diagnostic visualization using LDAM leads to a novel insight into the parameter averaging method for deep net training.

t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data

1 code implementation31 Jul 2018 David M. Chan, Roshan Rao, Forrest Huang, John F. Canny

Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples.

Dimensionality Reduction

Cannot find the paper you are looking for? You can Submit a new open access paper.