Search Results for author: Arnuv Tandon

Found 2 papers, 0 papers with code

Deceptive Alignment Monitoring

no code implementations20 Jul 2023 Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves.

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

no code implementations20 Jul 2023 Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks.

Anomaly Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.