Search Results for author: Julian Michael

Found 22 papers, 16 papers with code

Overconfidence in the Face of Ambiguity with Adversarial Data

1 code implementation NAACL (DADC) 2022 Margaret Li, Julian Michael

Adversarial data collection has shown promise as a method for building models which are more robust to the spurious correlations that generally appear in naturalistic data.

Natural Language Inference

Analyzing the Role of Semantic Representations in the Era of Large Language Models

1 code implementation2 May 2024 Zhijing Jin, Yuen Chen, Fernando Gonzalez, Jiarui Liu, Jiayi Zhang, Julian Michael, Bernhard Schölkopf, Mona Diab

We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions, named entities, and in the final inference step where the LLM must connect its reasoning over the AMR to its prediction.

The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP

no code implementations1 Dec 2023 Julian Michael

I propose a paradigm for scientific progress in NLP centered around developing scalable, data-driven theories of linguistic structure.

Semantic Role Labeling

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

2 code implementations20 Nov 2023 David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman

We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.


Debate Helps Supervise Unreliable Experts

1 code implementation15 Nov 2023 Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman

Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better, with 84% judge accuracy compared to consultancy's 74%.

Reading Comprehension

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

1 code implementation NeurIPS 2023 Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman

We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs--e. g., by reordering the multiple-choice options in a few-shot prompt to make the answer always "(A)"--which models systematically fail to mention in their explanations.


We're Afraid Language Models Aren't Modeling Ambiguity

1 code implementation27 Apr 2023 Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

We find that the task remains extremely challenging, including for GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset.


kNN-Prompt: Nearest Neighbor Zero-Shot Inference

1 code implementation27 May 2022 Weijia Shi, Julian Michael, Suchin Gururangan, Luke Zettlemoyer

Retrieval-augmented language models (LMs) use non-parametric memory to substantially outperform their non-retrieval counterparts on perplexity-based evaluations, but it is an open question whether they achieve similar gains in few- and zero-shot end-task accuracy.

Domain Adaptation Language Modelling +6

Asking It All: Generating Contextualized Questions for any Semantic Role

1 code implementation EMNLP 2021 Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan

We develop a two-stage model for this task, which first produces a context-independent question prototype for each role and then revises it to be contextually appropriate for the passage.

Question Generation Question-Generation

Prompting Contrastive Explanations for Commonsense Reasoning Tasks

no code implementations Findings (ACL) 2021 Bhargavi Paranjape, Julian Michael, Marjan Ghazvininejad, Luke Zettlemoyer, Hannaneh Hajishirzi

Many commonsense reasoning NLP tasks involve choosing between one or more possible answers to a question or prompt based on knowledge that is often implicit.


Asking without Telling: Exploring Latent Ontologies in Contextual Representations

no code implementations EMNLP 2020 Julian Michael, Jan A. Botha, Ian Tenney

The success of pretrained contextual encoders, such as ELMo and BERT, has brought a great deal of interest in what these models learn: do they, without explicit supervision, learn to encode meaningful notions of linguistic structure?

AmbigQA: Answering Ambiguous Open-domain Questions

2 code implementations EMNLP 2020 Sewon Min, Julian Michael, Hannaneh Hajishirzi, Luke Zettlemoyer

Ambiguity is inherent to open-domain question answering; especially when exploring new topics, it can be difficult to ask questions that have a single, unambiguous answer.

Open-Domain Question Answering Weakly-supervised Learning

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

6 code implementations NeurIPS 2019 Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks.

Transfer Learning

Large-Scale QA-SRL Parsing

3 code implementations ACL 2018 Nicholas FitzGerald, Julian Michael, Luheng He, Luke Zettlemoyer

We present a new large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser.

Semantic Role Labeling

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

11 code implementations WS 2018 Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset.

Natural Language Inference Natural Language Understanding +2

Crowdsourcing Question-Answer Meaning Representations

1 code implementation NAACL 2018 Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, Luke Zettlemoyer

We introduce Question-Answer Meaning Representations (QAMRs), which represent the predicate-argument structure of a sentence as a set of question-answer pairs.


Cannot find the paper you are looking for? You can Submit a new open access paper.