no code implementations • 17 Feb 2025 • Dylan Zhang, Justin Wang, Tianran Sun
Existing LMs struggle with proof-oriented programming due to data scarcity, which manifest in two key ways: (1) a lack of sufficient corpora for proof-oriented programming languages such as F*, and (2) the absence of large-scale, project-level proof-oriented implementations that can teach the model the intricate reasoning process when performing proof-oriented programming.
1 code implementation • 11 Oct 2024 • Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, Eric Winsor, Jerome Wynne, Yarin Gal, Xander Davies
The robustness of LLMs to jailbreak attacks, where users design prompts to circumvent safety measures and misuse model capabilities, has been studied primarily for LLMs acting as simple chatbots.
no code implementations • 7 Oct 2024 • Dylan Zhang, Justin Wang, Francois Charton
In both cases, we demonstrate that 1) better performance can be achieved by increasing the diversity of an established dataset while keeping the data size constant, and 2) when scaling up the data, diversifying the semantics of instructions is more effective than simply increasing the quantity of similar data.
no code implementations • 20 Sep 2024 • Justin Wang, Haimin Hu, Duy Phuong Nguyen, Jaime Fernández Fisac
While robust optimal control theory provides a rigorous framework to compute robot control policies that are provably safe, it struggles to scale to high-dimensional problems, leading to increased use of deep learning for tractable synthesis of robot safety.
2 code implementations • 1 Aug 2024 • Rishub Tamirisa, Bhrugu Bharathi, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika
Rapid advances in the capabilities of large language models (LLMs) have raised widespread concerns regarding their potential for malicious use.
4 code implementations • 6 Jun 2024 • Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks
Existing techniques aimed at improving alignment, such as refusal training, are often bypassed.
no code implementations • 30 May 2024 • Dylan Zhang, Justin Wang, Francois Charton
Instruction tuning -- tuning large language models on instruction-output pairs -- is a promising technique for making models better adapted to the real world.
no code implementations • 16 Feb 2024 • Dylan Zhang, Justin Wang, Francois Charton
We investigate the trade-off between the number of instructions the model is trained on and the number of training samples provided for each instruction and observe that the diversity of the instruction set determines generalization.
no code implementations • 24 Jun 2020 • Justin Wang, Edward Xu, Kangrui Xue, Lukasz Kidzinski
In this work, we build upon existing methods for occlusion-aware 3D pose detection in videos.