1 code implementation • 24 Jun 2024 • Marzena Karpinska, Katherine Thai, Kyle Lo, Tanya Goyal, Mohit Iyyer
Synthetic long-context LLM benchmarks (e. g., "needle-in-the-haystack") test only surface-level retrieval capabilities, but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs?
no code implementations • 21 Apr 2024 • Ali Naseh, Katherine Thai, Mohit Iyyer, Amir Houmansadr
With the digital imagery landscape rapidly evolving, image stocks and AI-generated image marketplaces have become central to visual media.
1 code implementation • 25 Oct 2022 • Katherine Thai, Marzena Karpinska, Kalpesh Krishna, Bill Ray, Moira Inghilleri, John Wieting, Mohit Iyyer
Using Par3, we discover that expert literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%, while state-of-the-art automatic MT metrics do not correlate with those preferences.
1 code implementation • 25 Oct 2022 • Marzena Karpinska, Nishant Raj, Katherine Thai, Yixiao Song, Ankita Gupta, Mohit Iyyer
While machine translation evaluation metrics based on string overlap (e. g., BLEU) have their limitations, their computations are transparent: the BLEU score assigned to a particular candidate translation can be traced back to the presence or absence of certain words.
2 code implementations • NAACL 2022 • Simeng Sun, Katherine Thai, Mohit Iyyer
While numerous architectures for long-range language models (LRLMs) have recently been proposed, a meaningful evaluation of their discourse-level language understanding capabilities has not yet followed.
1 code implementation • ACL 2022 • Katherine Thai, Yapei Chang, Kalpesh Krishna, Mohit Iyyer
Humanities scholars commonly provide evidence for claims that they make about a work of literature (e. g., a novel) in the form of quotations from the work.