Search Results for author: Erik Jones

Found 10 papers, 7 papers with code

Feedback Loops With Language Models Drive In-Context Reward Hacking

1 code implementation9 Feb 2024 Alexander Pan, Erik Jones, Meena Jagadeesan, Jacob Steinhardt

Language models influence the external world: they query APIs that read and write to web pages, generate content that shapes human behavior, and run system commands as autonomous agents.

Teaching Language Models to Hallucinate Less with Synthetic Tasks

no code implementations10 Oct 2023 Erik Jones, Hamid Palangi, Clarisse Simões, Varun Chandrasekaran, Subhabrata Mukherjee, Arindam Mitra, Ahmed Awadallah, Ece Kamar

We also find that optimizing the system message rather than the model weights can be critical; fine-tuning the entire model on the synthetic task can counterintuitively increase hallucination.

Abstractive Text Summarization Hallucination +3

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

1 code implementation26 Sep 2023 Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.

Mass-Producing Failures of Multimodal Systems with Language Models

1 code implementation NeurIPS 2023 Shengbang Tong, Erik Jones, Jacob Steinhardt

Because CLIP is the backbone for most state-of-the-art multimodal systems, these inputs produce failures in Midjourney 5. 1, DALL-E, VideoFusion, and others.

Language Modelling Self-Driving Cars

Automatically Auditing Large Language Models via Discrete Optimization

1 code implementation8 Mar 2023 Erik Jones, Anca Dragan, aditi raghunathan, Jacob Steinhardt

Auditing large language models for unexpected behaviors is critical to preempt catastrophic deployments, yet remains challenging.

Capturing Failures of Large Language Models via Human Cognitive Biases

no code implementations24 Feb 2022 Erik Jones, Jacob Steinhardt

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code.

Code Generation

Selective Classification Can Magnify Disparities Across Groups

1 code implementation ICLR 2021 Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang

In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.