1 code implementation • 3 Apr 2024 • Stephen Casper, Jieun Yun, Joonhyuk Baek, Yeseong Jung, Minhwan Kim, Kiwan Kwon, Saerom Park, Hayden Moore, David Shriver, Marissa Connor, Keltin Grimes, Angus Nicolson, Arush Tagade, Jessica Rumbelow, Hieu Minh Nguyen, Dylan Hadfield-Menell
Interpretability techniques are valuable for helping humans understand and oversee AI systems.
1 code implementation • NeurIPS 2020 • Sungyoon Lee, Jaewook Lee, Saerom Park
Our certifiable training algorithm provides a tight propagated outer bound by introducing the box constraint propagation (BCP), and it efficiently computes the worst logit over the outer bound.