Search Results for author: Erik Jones

Found 10 papers, 7 papers with code

Feedback Loops With Language Models Drive In-Context Reward Hacking

1 code implementation • 9 Feb 2024 • Alexander Pan, Erik Jones, Meena Jagadeesan, Jacob Steinhardt

Language models influence the external world: they query APIs that read and write to web pages, generate content that shapes human behavior, and run system commands as autonomous agents.

Paper
Code

Orca 2: Teaching Small Language Models How to Reason

no code implementations • 18 Nov 2023 • Arindam Mitra, Luciano del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadallah

Research on training small LMs has often relied on imitation learning to replicate the output of more capable models.

Ranked #1 on Crass AI on BIG-bench

Arithmetic Reasoning counterfactual +7

Paper
Add Code

Teaching Language Models to Hallucinate Less with Synthetic Tasks

no code implementations • 10 Oct 2023 • Erik Jones, Hamid Palangi, Clarisse Simões, Varun Chandrasekaran, Subhabrata Mukherjee, Arindam Mitra, Ahmed Awadallah, Ece Kamar

We also find that optimizing the system message rather than the model weights can be critical; fine-tuning the entire model on the synthetic task can counterintuitively increase hallucination.

Abstractive Text Summarization Hallucination +3

Paper
Add Code

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

1 code implementation • 26 Sep 2023 • Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.

Paper
Code

Mass-Producing Failures of Multimodal Systems with Language Models

1 code implementation • NeurIPS 2023 • Shengbang Tong, Erik Jones, Jacob Steinhardt

Because CLIP is the backbone for most state-of-the-art multimodal systems, these inputs produce failures in Midjourney 5. 1, DALL-E, VideoFusion, and others.

Language Modelling Self-Driving Cars

Paper
Code

Automatically Auditing Large Language Models via Discrete Optimization

1 code implementation • 8 Mar 2023 • Erik Jones, Anca Dragan, aditi raghunathan, Jacob Steinhardt

Auditing large language models for unexpected behaviors is critical to preempt catastrophic deployments, yet remains challenging.

Paper
Code

Capturing Failures of Large Language Models via Human Cognitive Biases

no code implementations • 24 Feb 2022 • Erik Jones, Jacob Steinhardt

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code.

Code Generation

Paper
Add Code

Selective Classification Can Magnify Disparities Across Groups

1 code implementation • ICLR 2021 • Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang

In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations.

Classification General Classification