1 code implementation • 20 Mar 2024 • Taeyoun Kim, Suhas Kotha, aditi raghunathan
The rise of "jailbreak" attacks on language models has led to a flurry of defenses aimed at preventing undesirable responses.
no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson
Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.
4 code implementations • 23 Feb 2024 • Jacob Mitchell Springer, Suhas Kotha, Daniel Fried, Graham Neubig, aditi raghunathan
In this work, we address an architectural limitation of autoregressive models: token embeddings cannot contain information from tokens that appear later in the input.
1 code implementation • 18 Sep 2023 • Suhas Kotha, Jacob Mitchell Springer, aditi raghunathan
We lack a systematic understanding of the effects of fine-tuning (via methods such as instruction-tuning or reinforcement learning from human feedback), particularly on tasks outside the narrow fine-tuning distribution.
4 code implementations • NeurIPS 2023 • Suhas Kotha, Christopher Brix, Zico Kolter, Krishnamurthy Dvijotham, huan zhang
Most work on the formal verification of neural networks has focused on bounding the set of outputs that correspond to a given set of inputs (for example, bounded perturbations of a nominal input).
no code implementations • 20 Jan 2022 • Suhas Kotha, Anirudh Koul, Siddha Ganju, Meher Kasam
To solve this problem, we establish CELESTIAL-a self-supervised learning pipeline for effectively leveraging sparsely-labeled satellite imagery.