Search Results for author: Candace Ross

Found 24 papers, 10 papers with code

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

no code implementations11 Jun 2025 Benno Krojer, Mojtaba Komeili, Candace Ross, Quentin Garrido, Koustuv Sinha, Nicolas Ballas, Mahmoud Assran

This paper mitigates the challenges in accurately assessing model performance by introducing the Minimal Video Pairs (MVP) benchmark, a simple shortcut-aware video QA benchmark for assessing the physical understanding of video language models.

Multiple-choice

Multi-Modal Language Models as Text-to-Image Model Evaluators

no code implementations1 May 2025 Jiahui Chen, Candace Ross, Reyhane Askari-Hemmat, Koustuv Sinha, Melissa Hall, Michal Drozdzal, Adriana Romero-Soriano

The steady improvements of text-to-image (T2I) generative models lead to slow deprecation of automatic evaluation benchmarks that rely on static datasets, motivating researchers to seek alternative ways to evaluate the T2I progress.

What makes a good metric? Evaluating automatic metrics for text-to-image consistency

no code implementations18 Dec 2024 Candace Ross, Melissa Hall, Adriana Romero Soriano, Adina Williams

We also ablate different aspects of the text-image consistency metrics and find that not all model components are strictly necessary, also a symptom of insufficient sensitivity to visual information.

Visual Question Answering (VQA)

Improving Model Evaluation using SMART Filtering of Benchmark Datasets

1 code implementation26 Oct 2024 Vipul Gupta, Candace Ross, David Pantoja, Rebecca J. Passonneau, Megan Ung, Adina Williams

To address these concerns, we propose Selection Methodology for Accurate, Reduced, and Targeted (SMART) filtering, a novel approach to select a high-quality subset of examples from existing benchmark datasets by systematically removing less informative and less challenging examples.

Chatbot Diversity +1

Changing Answer Order Can Decrease MMLU Accuracy

no code implementations27 Jun 2024 Vipul Gupta, David Pantoja, Candace Ross, Adina Williams, Megan Ung

As large language models (LLMs) have grown in prevalence, particular benchmarks have become essential for the evaluation of these models and for understanding model capabilities.

MMLU Multiple-choice +1

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

1 code implementation6 Jun 2024 Reyhane Askari Hemmat, Melissa Hall, Alicia Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano

With the growing popularity of text-to-image generative models, there has been increasing focus on understanding their risks and biases.

Diversity

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

1 code implementation7 May 2024 Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, Adriana Romero Soriano

We contrast human annotations with common automated metrics, finding that human preferences vary notably across geographic location and that current metrics do not fully account for this diversity.

Diversity

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

1 code implementation9 Apr 2024 Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques.

Improving Text-to-Image Consistency via Automatic Prompt Optimization

no code implementations26 Mar 2024 Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.

Language Modelling Large Language Model

Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision

no code implementations25 Nov 2023 Nicholas Lui, Bryan Chia, William Berrios, Candace Ross, Douwe Kiela

In this work, we demonstrate that diffusion models can be leveraged to create such a dataset.

Fairness

FACET: Fairness in Computer Vision Evaluation Benchmark

no code implementations ICCV 2023 Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Duval, Aaron Adcock, Cheng-Yang Fu, Melissa Hall, Candace Ross

We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation.

Fairness image-classification +4

DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity

2 code implementations11 Aug 2023 Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero Soriano

The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases.

Benchmarking Diversity +1

Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers

no code implementations16 Feb 2023 Melissa Hall, Bobbie Chern, Laura Gustafson, Denisse Ventura, Harshad Kulkarni, Candace Ross, Nicolas Usunier

These metrics successfully incentivized performance improvements on person-centric tasks such as face analysis and are used to understand risks of modern models.

Fairness image-classification +2

Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities

no code implementations26 Jan 2023 Melissa Hall, Laura Gustafson, Aaron Adcock, Ishan Misra, Candace Ross

With these capabilities in mind, we ask: Do vision-language models exhibit gender bias when performing zero-shot image classification, object detection and semantic segmentation?

image-classification Image Classification +5

Perturbation Augmentation for Fairer NLP

1 code implementation25 May 2022 Rebecca Qian, Candace Ross, Jude Fernandes, Eric Smith, Douwe Kiela, Adina Williams

Unwanted and often harmful social biases are becoming ever more salient in NLP research, affecting both models and datasets.

Fairness

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

2 code implementations CVPR 2022 Tristan Thrush, Ryan Jiang, Max Bartolo, Amanpreet Singh, Adina Williams, Douwe Kiela, Candace Ross

We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground.

Visual Reasoning

CM3: A Causal Masked Multimodal Model of the Internet

no code implementations19 Jan 2022 Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens.

Articles Entity Disambiguation +1

Learning a natural-language to LTL executable semantic parser for grounded robotics

no code implementations7 Aug 2020 Christopher Wang, Candace Ross, Yen-Ling Kuo, Boris Katz, Andrei Barbu

We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execution of natural-language commands.

Sentence

Measuring Social Biases in Grounded Vision and Language Embeddings

1 code implementation NAACL 2021 Candace Ross, Boris Katz, Andrei Barbu

We generalize the notion of social biases from language embeddings to grounded vision and language embeddings.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.