1 code implementation • 5 Mar 2024 • Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Samuel Marks, Oam Patel, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Lin, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Ruoyu Wang, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs.
5 code implementations • 2 Oct 2023 • Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience.
Ranked #3 on Question Answering on TruthfulQA
1 code implementation • 10 Jun 2022 • Ann-Kathrin Dombrowski, Jan E. Gerken, Klaus-Robert Müller, Pan Kessel
Counterfactuals can explain classification decisions of neural networks in a human interpretable way.
no code implementations • 7 Jan 2022 • Ann-Kathrin Dombrowski, Klaus-Robert Müller, Wolf Christian Müller
The application of machine learning (ML) techniques, especially neural networks, has seen tremendous success at processing images and language.
no code implementations • ICML Workshop INNF 2021 • Ann-Kathrin Dombrowski, Jan E Gerken, Pan Kessel
Normalizing flows are diffeomorphisms which are parameterized by neural networks.
no code implementations • 18 Dec 2020 • Ann-Kathrin Dombrowski, Christopher J. Anders, Klaus-Robert Müller, Pan Kessel
Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks.
1 code implementation • ICML 2020 • Christopher J. Anders, Plamen Pasliev, Ann-Kathrin Dombrowski, Klaus-Robert Müller, Pan Kessel
Explanation methods promise to make black-box classifiers more transparent.
2 code implementations • NeurIPS 2019 • Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J. Anders, Marcel Ackermann, Klaus-Robert Müller, Pan Kessel
Explanation methods aim to make neural networks more trustworthy and interpretable.
no code implementations • 1 Aug 2017 • Michael Gadermayr, Ann-Kathrin Dombrowski, Barbara Mara Klinkhammer, Peter Boor, Dorit Merhof
Due to the increasing availability of whole slide scanners facilitating digitization of histopathological tissue, there is a strong demand for the development of computer based image analysis systems.