no code implementations • 16 Nov 2023 • Yilun Zhao, Yitao Long, Hongjun Liu, Linyong Nan, Lyuhao Chen, Ryo Kamoi, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan
This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning and problem-solving capabilities of LLMs in the context of understanding and analyzing financial documents containing both text and tables.
1 code implementation • 14 Nov 2023 • Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang
We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people and propose four reference-free automatic metrics measuring the differences between target and source perspectives.
1 code implementation • 2 Mar 2023 • Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett
Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation.
1 code implementation • 13 Oct 2022 • Ryo Kamoi, Tanya Goyal, Greg Durrett
Despite recent progress in abstractive summarization, models often generate summaries with factual errors.
no code implementations • 1 Mar 2020 • Ryo Kamoi, Kei Kobayashi
This suggests that the reason the Mahalanobis confidence score works so well is mistaken, and makes use of different information from ODIN, another popular OoD detection method based on prediction confidence.
no code implementations • 15 Nov 2019 • Ryo Kamoi, Kei Kobayashi
This paper focuses on the relationship between the choice of a prior distribution and the likelihoods assigned to out-of-distribution inputs.