Search Results for author: James Chua

Found 5 papers, 3 papers with code

Tell me about yourself: LLMs are aware of their learned behaviors

1 code implementation19 Jan 2025 Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans

Note that while we finetune models to exhibit behaviors like writing insecure code, we do not finetune them to articulate their own behaviors -- models do this without any special training or examples.

Inference-Time-Compute: More Faithful? A Research Note

no code implementations14 Jan 2025 James Chua, Owain Evans

We refer to these models as Inference-Time-Compute (ITC) models.

Attribute MMLU

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

no code implementations21 Jul 2024 Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez

These results stand in stark contrast to existing evidence of universal and transferable text jailbreaks against language models and transferable adversarial attacks against image classifiers, suggesting that VLMs may be more robust to gradient-based transfer attacks.

Instruction Following Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.