Search Results for author: Kaivalya Hariharan

Found 2 papers, 1 papers with code

Forbidden Facts: An Investigation of Competing Objectives in Llama-2

no code implementations • 14 Dec 2023 • Tony T. Wang, Miles Wang, Kaivalya Hariharan, Nir Shavit

LLMs often face competing pressures (for example helpfulness vs. harmlessness).

Paper
Add Code

Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks

1 code implementation • 18 Nov 2022 • Stephen Casper, Kaivalya Hariharan, Dylan Hadfield-Menell

Some previous works have proposed using human-interpretable adversarial attacks including copy/paste attacks in which one natural image pasted into another causes an unexpected misclassification.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.