1 code implementation • 4 Jul 2024 • Zhengping Jiang, Jingyu Zhang, Nathaniel Weir, Seth Ebner, Miriam Wanner, Kate Sanders, Daniel Khashabi, Anqi Liu, Benjamin Van Durme
Hallucinations -- the generation of untrue claims -- pose a challenge to the application of large language models (LLMs) [1] thereby motivating the development of metrics to evaluate factual precision.
1 code implementation • 14 Jun 2024 • Kate Sanders, Benjamin Van Durme
In this paper, we survey 105 video datasets that require event understanding capability, consider how they contribute to the study of robust event understanding in video, and assess proposed video event extraction tasks in the context of this body of research.
no code implementations • 2 May 2024 • James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler
Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users.
no code implementations • 18 Mar 2024 • Kevin Xu, Yeganeh Kordi, Tanay Nayak, Ado Asija, Yizhong Wang, Kate Sanders, Adam Byerly, Jingyu Zhang, Benjamin Van Durme, Daniel Khashabi
To support the evaluation of TurkingBench, we have developed a framework that links chatbot responses to actions on web pages (e. g., modifying a text box, selecting a radio button).
no code implementations • 29 Feb 2024 • Kate Sanders, Nathaniel Weir, Benjamin Van Durme
It is challenging to perform question-answering over complex, multimodal content such as television clips.
no code implementations • 22 Feb 2024 • Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme
Recent language models enable new opportunities for structured reasoning with text, such as the construction of intuitive, proof-like textual entailment trees without relying on brittle formal logic.
no code implementations • 6 Jul 2023 • Kate Sanders, David Etter, Reno Kriz, Benjamin Van Durme
Everyday news coverage has shifted from traditional broadcasts towards a wide range of presentation formats such as first-hand, unedited video footage.
no code implementations • 6 Oct 2022 • Kate Sanders, Reno Kriz, Anqi Liu, Benjamin Van Durme
However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data.
no code implementations • 20 Jul 2020 • Kate Sanders, Michael Danielczuk, Jeffrey Mahler, Ajay Tanwani, Ken Goldberg
A new generation of automated bin picking systems using deep learning is evolving to support increasing demand for e-commerce.