no code implementations • FEVER (ACL) 2022 • Ian Kelk, Benjamin Basseri, Wee Lee, Richard Qiu, Chris Tanner
Automatic fake news detection models are ostensibly based on logic, where the truth of a claim made in a headline can be determined by supporting or refuting evidence found in a resulting web query.
no code implementations • *SEM (NAACL) 2022 • Alessandro Stolfo, Chris Tanner, Vikram Gupta, Mrinmaya Sachan
Labeled data for the task of Coreference Resolution is a scarce resource, requiring significant human effort.
1 code implementation • 2 Mar 2024 • Omri Uzan, Craig W. Schmidt, Chris Tanner, Yuval Pinter
While subword tokenizers such as BPE and WordPiece are typically used to build vocabularies for NLP models, the method of decoding text into a sequence of tokens from these vocabularies is often left unspecified, or ill-suited to the method in which they were constructed.
no code implementations • 28 Feb 2024 • Craig W. Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, Chris Tanner
Tokenization is a foundational step in Natural Language Processing (NLP) tasks, bridging raw text and language models.
no code implementations • 12 Jan 2024 • Varshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner
For large language models (LLMs) to be effective in the financial domain -- where each decision can have a significant impact -- it is necessary to investigate realistic tasks and data.
no code implementations • 11 Nov 2023 • Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, Chris Tanner
We demonstrate that the current bottleneck in performance is due to LLMs' limited business and financial understanding, highlighting the value of a challenging benchmark for quantitative reasoning within this domain.
1 code implementation • 3 Aug 2023 • Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, Maxim Sokolov, Vadym Barda, Delphine Vendryes, Chris Tanner
Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e. g., text, title, figure).
Ranked #13 on Document Layout Analysis on PubLayNet val
1 code implementation • 20 Feb 2023 • Sahithya Ravi, Chris Tanner, Raymond Ng, Vered Shwartz
Event coreference models cluster event mentions pertaining to the same real-world event.
1 code implementation • 15 Jul 2022 • Anita Mahinpei, Zona Kostic, Chris Tanner
Data visualization captions help readers understand the purpose of a visualization and are crucial for individuals with visual impairments.
1 code implementation • insights (ACL) 2022 • Xiaohan Yang, Eduardo Peynetti, Vasco Meerman, Chris Tanner
Coreference resolution -- which is a crucial task for understanding discourse and language at large -- has yet to witness widespread benefits from large language models (LLMs).
no code implementations • 9 May 2022 • Blake Bullwinkel, Kristen Grabarz, Lily Ke, Scarlett Gong, Chris Tanner, Joshua Allen
Differentially private (DP) synthetic data is a promising approach to maximizing the utility of data containing sensitive information.
no code implementations • 14 Apr 2022 • Ian Kelk, Benjamin Basseri, Wee Yi Lee, Richard Qiu, Chris Tanner
Automatic fake news detection models are ostensibly based on logic, where the truth of a claim made in a headline can be determined by supporting or refuting evidence found in a resulting web query.
no code implementations • 1 Dec 2021 • Felix Grezes, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Golnaz Shapurian, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Nemanja Martinovic, Shinyi Chen, Chris Tanner, Pavlos Protopapas
The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e. g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search.