Search Results for author: Chris Tanner

Found 14 papers, 5 papers with code

Automatic Fake News Detection: Are current models “fact-checking” or“gut-checking”?

no code implementations FEVER (ACL) 2022 Ian Kelk, Benjamin Basseri, Wee Lee, Richard Qiu, Chris Tanner

Automatic fake news detection models are ostensibly based on logic, where the truth of a claim made in a headline can be determined by supporting or refuting evidence found in a resulting web query.

Fact Checking Fake News Detection +2

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

1 code implementation2 Mar 2024 Omri Uzan, Craig W. Schmidt, Chris Tanner, Yuval Pinter

While subword tokenizers such as BPE and WordPiece are typically used to build vocabularies for NLP models, the method of decoding text into a sequence of tokens from these vocabularies is often left unspecified, or ill-suited to the method in which they were constructed.

Tokenization Is More Than Compression

no code implementations28 Feb 2024 Craig W. Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, Chris Tanner

Tokenization is a foundational step in Natural Language Processing (NLP) tasks, bridging raw text and language models.

Data Compression

DocFinQA: A Long-Context Financial Reasoning Dataset

no code implementations12 Jan 2024 Varshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner

For large language models (LLMs) to be effective in the financial domain -- where each decision can have a significant impact -- it is necessary to investigate realistic tasks and data.

Retrieval Specificity

BizBench: A Quantitative Reasoning Benchmark for Business and Finance

no code implementations11 Nov 2023 Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, Chris Tanner

We demonstrate that the current bottleneck in performance is due to LLMs' limited business and financial understanding, highlighting the value of a challenging benchmark for quantitative reasoning within this domain.

Code Generation Program Synthesis +2

A Graphical Approach to Document Layout Analysis

1 code implementation3 Aug 2023 Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, Maxim Sokolov, Vadym Barda, Delphine Vendryes, Chris Tanner

Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e. g., text, title, figure).

Document Layout Analysis

LineCap: Line Charts for Data Visualization Captioning Models

1 code implementation15 Jul 2022 Anita Mahinpei, Zona Kostic, Chris Tanner

Data visualization captions help readers understand the purpose of a visualization and are crucial for individuals with visual impairments.

Data Visualization Image Captioning

What GPT Knows About Who is Who

1 code implementation insights (ACL) 2022 Xiaohan Yang, Eduardo Peynetti, Vasco Meerman, Chris Tanner

Coreference resolution -- which is a crucial task for understanding discourse and language at large -- has yet to witness widespread benefits from large language models (LLMs).

coreference-resolution Prompt Engineering +1

Evaluating the Fairness Impact of Differentially Private Synthetic Data

no code implementations9 May 2022 Blake Bullwinkel, Kristen Grabarz, Lily Ke, Scarlett Gong, Chris Tanner, Joshua Allen

Differentially private (DP) synthetic data is a promising approach to maximizing the utility of data containing sensitive information.

Binary Classification Fairness

Automatic Fake News Detection: Are current models "fact-checking" or "gut-checking"?

no code implementations14 Apr 2022 Ian Kelk, Benjamin Basseri, Wee Yi Lee, Richard Qiu, Chris Tanner

Automatic fake news detection models are ostensibly based on logic, where the truth of a claim made in a headline can be determined by supporting or refuting evidence found in a resulting web query.

Fact Checking Fake News Detection +2

Building astroBERT, a language model for Astronomy & Astrophysics

no code implementations1 Dec 2021 Felix Grezes, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Golnaz Shapurian, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Nemanja Martinovic, Shinyi Chen, Chris Tanner, Pavlos Protopapas

The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e. g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search.

Astronomy Language Modelling +3

Cannot find the paper you are looking for? You can Submit a new open access paper.