Search Results for author: Benjamin Newman

Found 13 papers, 6 papers with code

ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models

1 code implementation25 Oct 2024 Benjamin Newman, Yoonjoo Lee, Aakanksha Naik, Pao Siangliulue, Raymond Fok, Juho Kim, Daniel S. Weld, Joseph Chee Chang, Kyle Lo

When conducting literature reviews, scientists often create literature review tables - tables whose rows are publications and whose columns constitute a schema, a set of aspects used to compare and contrast the papers.

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

no code implementations24 Jul 2024 Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi

While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about.

Chatbot Form +2

Assessment of Sports Concussion in Female Athletes: A Role for Neuroinformatics?

no code implementations23 Jan 2024 Rachel Edelstein, Sterling Gutterman, Benjamin Newman, John Darrell Van Horn

Advanced neuroinformatics techniques and machine learning models have become invaluable assets in this endeavor.

Experimental Design

The Generative AI Paradox: "What It Can Create, It May Not Understand"

no code implementations31 Oct 2023 Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi

Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert-like outputs, acquire generative capabilities that are not contingent upon -- and can therefore exceed -- their ability to understand those same types of outputs.

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

no code implementations24 May 2023 Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle Lo

Many real-world applications (e. g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document.

Question Answering Question Generation +2

Ensembles and Cocktails: Robust Finetuning for Natural Language Generation

no code implementations29 Sep 2021 John Hewitt, Xiang Lisa Li, Sang Michael Xie, Benjamin Newman, Percy Liang

When finetuning a pretrained language model for natural language generation tasks, one is currently faced with a tradeoff.

Language Modelling Text Generation

Refining Targeted Syntactic Evaluation of Language Models

1 code implementation NAACL 2021 Benjamin Newman, Kai-Siang Ang, Julia Gong, John Hewitt

Targeted syntactic evaluation of subject-verb number agreement in English (TSE) evaluates language models' syntactic knowledge using hand-crafted minimal pairs of sentences that differ only in the main verb's conjugation.

Sentence

Optimal Assistance for Object-Rearrangement Tasks in Augmented Reality

no code implementations14 Oct 2020 Benjamin Newman, Kevin Carlberg, Ruta Desai

We introduce a novel framework for computing and displaying AR assistance that consists of (1) associating an optimal action sequence with the policy of an embodied agent and (2) presenting this sequence to the user as suggestions in the AR system's heads-up display.

Object Rearrangement

The EOS Decision and Length Extrapolation

1 code implementation EMNLP (BlackboxNLP) 2020 Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning

Extrapolation to unseen sequence lengths is a challenge for neural generative models of language.

Communication-based Evaluation for Natural Language Generation

1 code implementation SCiL 2020 Benjamin Newman, Reuben Cohn-Gordon, Christopher Potts

Natural language generation (NLG) systems are commonly evaluated using n-gram overlap measures (e. g. BLEU, ROUGE).

Text Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.