Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.
Despite exciting recent results showing vision-language systems' capacity to reason about images using natural language, their capacity for video reasoning remains under-explored.
Voice conversion (VC) models have demonstrated impressive few-shot conversion quality on the clean, native speech populations they're trained on.
We apply VCoT to the Visual Storytelling and WikiHow summarization datasets and demonstrate through human evaluation that VCoT offers novel and consistent synthetic data augmentation beating chain of thought baselines, which can be used to enhance downstream performance.
We conduct a broad literature survey, identifying many clusters of similar conceptions of transparency, tying each back to our north star with analysis of how it furthers or hinders our ideal AI transparency goals.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans.
As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging.
Is it possible to build a general and automatic natural language generation (NLG) evaluation metric?
While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations.
Building natural language inference (NLI) benchmarks that are both challenging for modern techniques, and free from shortcut biases is difficult.
This is a particularly notable issue in the medical domain, where layman are often confused by medical text online.
End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model.
Ranked #9 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)
Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues.
Broader disclosive transparency$-$truth and clarity in communication regarding the function of AI systems$-$is widely considered desirable.
We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease.
To demonstrate that the features derived from these acoustic models are specific to hypernasal speech, we evaluate them across different dysarthria corpora.