no code implementations • 22 Jul 2024 • Michael Saxon, Ari Holtzman, Peter West, William Yang Wang, Naomi Saphra
Modern language models (LMs) pose a new challenge in capability assessment.
1 code implementation • 2 Jul 2024 • Qiucheng Wu, Handong Zhao, Michael Saxon, Trung Bui, William Yang Wang, Yang Zhang, Shiyu Chang
One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of objects and devise action plans to achieve desired outcomes in visual scenes.
no code implementations • 24 Jun 2024 • Aditya Sharma, Michael Saxon, William Yang Wang
We present LoCoVQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (VLMs).
1 code implementation • 12 Jun 2024 • Weixi Feng, Jiachen Li, Michael Saxon, Tsu-Jui Fu, Wenhu Chen, William Yang Wang
Video generation has many unique challenges beyond those of image generation.
1 code implementation • 5 Apr 2024 • Michael Saxon, Fatima Jahara, Mahsa Khoshnoodi, Yujie Lu, Aditya Sharma, William Yang Wang
With advances in the quality of text-to-image (T2I) models has come interest in benchmarking their prompt faithfulness -- the semantic coherence of generated images to the prompts they were conditioned on.
no code implementations • 17 Mar 2024 • Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang
Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set.
1 code implementation • 6 Aug 2023 • Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
1 code implementation • 2 Jun 2023 • Michael Saxon, William Yang Wang
We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.
1 code implementation • 23 May 2023 • Vaishnavi Himakunthala, Andy Ouyang, Daniel Rose, Ryan He, Alex Mei, Yujie Lu, Chinmay Sonar, Michael Saxon, William Yang Wang
Despite exciting recent results showing vision-language systems' capacity to reason about images using natural language, their capacity for video reasoning remains under-explored.
no code implementations • 18 May 2023 • Avani Tanna, Michael Saxon, Amr El Abbadi, William Yang Wang
Voice conversion (VC) models have demonstrated impressive few-shot conversion quality on the clean, native speech populations they're trained on.
no code implementations • 3 May 2023 • Daniel Rose, Vaishnavi Himakunthala, Andy Ouyang, Ryan He, Alex Mei, Yujie Lu, Michael Saxon, Chinmay Sonar, Diba Mirza, William Yang Wang
Recent advances in large language models elicit reasoning in a chain-of-thought that allows models to decompose problems in a human-like fashion.
no code implementations • 9 Mar 2023 • Alex Mei, Michael Saxon, Shiyu Chang, Zachary C. Lipton, William Yang Wang
We conduct a broad literature survey, identifying many clusters of similar conceptions of transparency, tying each back to our north star with analysis of how it furthers or hinders our ideal AI transparency goals.
1 code implementation • NeurIPS 2023 • Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang Wang
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
1 code implementation • 20 Dec 2022 • Yi-Lin Tuan, Alon Albalak, Wenda Xu, Michael Saxon, Connor Pryor, Lise Getoor, William Yang Wang
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans.
no code implementations • 21 Oct 2022 • Matthew Ho, Aditya Sharma, Justin Chang, Michael Saxon, Sharon Levy, Yujie Lu, William Yang Wang
As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging.
1 code implementation • 10 Oct 2022 • Wenda Xu, YiLin Tuan, Yujie Lu, Michael Saxon, Lei LI, William Yang Wang
Is it possible to build a general and automatic natural language generation (NLG) evaluation metric?
1 code implementation • 10 Jun 2022 • Xinyi Wang, Michael Saxon, Jiachen Li, Hongyang Zhang, Kun Zhang, William Yang Wang
While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations.
no code implementations • 16 Dec 2021 • Michael Saxon, Xinyi Wang, Wenda Xu, William Yang Wang
Building natural language inference (NLI) benchmarks that are both challenging for modern techniques, and free from shortcut biases is difficult.
1 code implementation • 6 Oct 2021 • Wenda Xu, Michael Saxon, Misha Sra, William Yang Wang
This is a particularly notable issue in the medical domain, where layman are often confused by medical text online.
no code implementations • 16 Jun 2021 • Michael Saxon, Samridhi Choudhary, Joseph P. McKenna, Athanasios Mouchtaris
End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model.
Ranked #10 on
Spoken Language Understanding
on Fluent Speech Commands
(using extra training data)
1 code implementation • NeurIPS 2021 • Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang
Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues.
1 code implementation • EMNLP 2021 • Michael Saxon, Sharon Levy, Xinyi Wang, Alon Albalak, William Yang Wang
Broader disclosive transparency$-$truth and clarity in communication regarding the function of AI systems$-$is widely considered desirable.
1 code implementation • Findings (ACL) 2021 • Sharon Levy, Michael Saxon, William Yang Wang
In this work, we investigate the capability of language models to generate conspiracy theory text.
no code implementations • 6 Aug 2020 • Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris
We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease.
no code implementations • 26 Nov 2019 • Michael Saxon, Ayush Tripathi, Yishan Jiao, Julie Liss, Visar Berisha
To demonstrate that the features derived from these acoustic models are specific to hypernasal speech, we evaluate them across different dysarthria corpora.