1 code implementation • 17 Jan 2025 • Paul Röttger, Giuseppe Attanasio, Felix Friedrich, Janis Goldzycher, Alicia Parrish, Rishabh Bhardwaj, Chiara Di Bonaventura, Roman Eng, Gaia El Khoury Geagea, Sujata Goswami, Jieun Han, Dirk Hovy, Seogyeong Jeong, Paloma Jeretič, Flor Miriam Plaza-del-Arco, Donya Rooein, Patrick Schramowski, Anastassia Shaitarova, Xudong Shen, Richard Willats, Andrea Zugarini, Bertie Vidgen
Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking.
no code implementations • 19 Dec 2024 • Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli, Huu Nguyen, Bo Li, Kristian Kersting
Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity.
no code implementations • 6 Dec 2024 • David Steinmann, Felix Divo, Maurice Kraus, Antonia Wüst, Lukas Struppek, Felix Friedrich, Kristian Kersting
Shortcuts, also described as Clever Hans behavior, spurious correlations, or confounders, present a significant challenge in machine learning and AI, critically affecting model generalization and robustness.
1 code implementation • 11 Nov 2024 • Ruben Härle, Felix Friedrich, Manuel Brack, Björn Deiseroth, Patrick Schramowski, Kristian Kersting
Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, but their output may not be aligned with the user or even produce harmful content.
1 code implementation • 7 Jun 2024 • Lukas Helff, Felix Friedrich, Manuel Brack, Kristian Kersting, Patrick Schramowski
This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models.
2 code implementations • 6 Apr 2024 • Simone Tedeschi, Felix Friedrich, Patrick Schramowski, Kristian Kersting, Roberto Navigli, Huu Nguyen, Bo Li
When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails.
no code implementations • 30 Mar 2024 • Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak, Aleksandr Drozd, Jordan Clive, Kshitij Gupta, Liangyu Chen, Qi Sun, Ken Tsui, Noah Persaud, Nour Fahmy, Tianlong Chen, Mohit Bansal, Nicolo Monti, Tai Dang, Ziyang Luo, Tien-Tung Bui, Roberto Navigli, Virendra Mehta, Matthew Blumberg, Victor May, Huu Nguyen, Sampo Pyysalo
Despite these efforts, such models encounter challenges such as limited multilingual capabilities, risks of catastrophic forgetting during continual pretraining, and the high costs of training models from scratch, alongside the need to align with AI safety standards and regulatory frameworks.
1 code implementation • 29 Jan 2024 • Felix Friedrich, Katharina Hämmerl, Patrick Schramowski, Manuel Brack, Jindrich Libovicky, Kristian Kersting, Alexander Fraser
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
1 code implementation • CVPR 2024 • Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, Apolinário Passos
Our results demonstrate the capabilities of LEDITS++ and its improvements over previous methods.
1 code implementation • 15 Sep 2023 • Wolfgang Stammer, Felix Friedrich, David Steinmann, Manuel Brack, Hikaru Shindo, Kristian Kersting
Much of explainable AI research treats explanations as a means for model inspection.
1 code implementation • 25 Aug 2023 • David Steinmann, Wolfgang Stammer, Felix Friedrich, Kristian Kersting
To rectify this, we present concept bottleneck memory models (CB2Ms), which keep a memory of past interventions.
no code implementations • 9 Jun 2023 • Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Canyu Chen, Hal Daumé III, Jesse Dodge, Isabella Duan, Ellie Evans, Felix Friedrich, Avijit Ghosh, Usman Gohar, Sara Hooker, Yacine Jernite, Ria Kalluri, Alberto Lusoli, Alina Leidinger, Michelle Lin, Xiuzhu Lin, Sasha Luccioni, Jennifer Mickel, Margaret Mitchell, Jessica Newman, Anaelia Ovalle, Marie-Therese Png, Shubham Singh, Andrew Strait, Lukas Struppek, Arjun Subramonian
Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated.
no code implementations • 28 May 2023 • Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting
Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications.
1 code implementation • NeurIPS 2023 • Marco Bellagente, Manuel Brack, Hannah Teufel, Felix Friedrich, Björn Deiseroth, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Koen Oostermeijer, Andres Felipe Cruz-Salinas, Patrick Schramowski, Kristian Kersting, Samuel Weinbach
The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users.
1 code implementation • 14 Apr 2023 • Felix Friedrich, David Steinmann, Kristian Kersting
Current machine learning models produce outstanding results in many areas but, at the same time, suffer from shortcut learning and spurious correlations.
1 code implementation • 16 Mar 2023 • Lukas Struppek, Dominik Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, Kristian Kersting
Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy.
1 code implementation • 7 Feb 2023 • Felix Friedrich, Manuel Brack, Lukas Struppek, Dominik Hintersdorf, Patrick Schramowski, Sasha Luccioni, Kristian Kersting
Generative AI models have recently achieved astonishing results in quality and are consequently employed in a fast-growing number of applications.
1 code implementation • NeurIPS 2023 • Manuel Brack, Felix Friedrich, Dominik Hintersdorf, Lukas Struppek, Patrick Schramowski, Kristian Kersting
This leaves the user with little semantic control.
2 code implementations • 12 Dec 2022 • Manuel Brack, Patrick Schramowski, Felix Friedrich, Dominik Hintersdorf, Kristian Kersting
Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone.
1 code implementation • 19 Oct 2022 • Felix Friedrich, Wolfgang Stammer, Patrick Schramowski, Kristian Kersting
In this work, we question the current common practice of storing all information in the model parameters and propose the Revision Transformer (RiT) to facilitate easy model updating.
2 code implementations • 19 Sep 2022 • Lukas Struppek, Dominik Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, Kristian Kersting
Models for text-to-image synthesis, such as DALL-E~2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public.
3 code implementations • 15 Sep 2022 • Dominik Hintersdorf, Lukas Struppek, Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting
Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy.
3 code implementations • 4 Mar 2022 • Felix Friedrich, Wolfgang Stammer, Patrick Schramowski, Kristian Kersting
In addition, we discuss existing and introduce novel measures and benchmarks for evaluating the overall abilities of a XIL method.
1 code implementation • 2 Sep 2021 • Felix Friedrich, Patrick Schramowski, Christopher Tauchmann, Kristian Kersting
Transformer language models are state of the art in a multitude of NLP tasks.