Search Results for author: Felix Friedrich

Found 24 papers, 19 papers with code

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

no code implementations19 Dec 2024 Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli, Huu Nguyen, Bo Li, Kristian Kersting

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity.

Diversity

Navigating Shortcuts, Spurious Correlations, and Confounders: From Origins via Detection to Mitigation

no code implementations6 Dec 2024 David Steinmann, Felix Divo, Maurice Kraus, Antonia Wüst, Lukas Struppek, Felix Friedrich, Kristian Kersting

Shortcuts, also described as Clever Hans behavior, spurious correlations, or confounders, present a significant challenge in machine learning and AI, critically affecting model generalization and robustness.

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs

1 code implementation11 Nov 2024 Ruben Härle, Felix Friedrich, Manuel Brack, Björn Deiseroth, Patrick Schramowski, Kristian Kersting

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, but their output may not be aligned with the user or even produce harmful content.

Text Generation

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

1 code implementation7 Jun 2024 Lukas Helff, Felix Friedrich, Manuel Brack, Kristian Kersting, Patrick Schramowski

This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models.

Learning by Self-Explaining

1 code implementation15 Sep 2023 Wolfgang Stammer, Felix Friedrich, David Steinmann, Manuel Brack, Hikaru Shindo, Kristian Kersting

Much of explainable AI research treats explanations as a means for model inspection.

Image Classification

Learning to Intervene on Concept Bottlenecks

1 code implementation25 Aug 2023 David Steinmann, Wolfgang Stammer, Felix Friedrich, Kristian Kersting

To rectify this, we present concept bottleneck memory models (CB2Ms), which keep a memory of past interventions.

Mitigating Inappropriateness in Image Generation: Can there be Value in Reflecting the World's Ugliness?

no code implementations28 May 2023 Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting

Text-conditioned image generation models have recently achieved astonishing results in image quality and text alignment and are consequently employed in a fast-growing number of applications.

Image Generation

One Explanation Does Not Fit XIL

1 code implementation14 Apr 2023 Felix Friedrich, David Steinmann, Kristian Kersting

Current machine learning models produce outstanding results in many areas but, at the same time, suffer from shortcut learning and spurious correlations.

Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations

1 code implementation16 Mar 2023 Lukas Struppek, Dominik Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, Kristian Kersting

Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy.

Attribute Face Recognition +2

Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness

1 code implementation7 Feb 2023 Felix Friedrich, Manuel Brack, Lukas Struppek, Dominik Hintersdorf, Patrick Schramowski, Sasha Luccioni, Kristian Kersting

Generative AI models have recently achieved astonishing results in quality and are consequently employed in a fast-growing number of applications.

Fairness Text-to-Image Generation

The Stable Artist: Steering Semantics in Diffusion Latent Space

2 code implementations12 Dec 2022 Manuel Brack, Patrick Schramowski, Felix Friedrich, Dominik Hintersdorf, Kristian Kersting

Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone.

Image Generation

Revision Transformers: Instructing Language Models to Change their Values

1 code implementation19 Oct 2022 Felix Friedrich, Wolfgang Stammer, Patrick Schramowski, Kristian Kersting

In this work, we question the current common practice of storing all information in the model parameters and propose the Revision Transformer (RiT) to facilitate easy model updating.

Information Retrieval Retrieval +1

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis

2 code implementations19 Sep 2022 Lukas Struppek, Dominik Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, Kristian Kersting

Models for text-to-image synthesis, such as DALL-E~2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public.

Image Generation

Does CLIP Know My Face?

3 code implementations15 Sep 2022 Dominik Hintersdorf, Lukas Struppek, Manuel Brack, Felix Friedrich, Patrick Schramowski, Kristian Kersting

Our large-scale experiments on CLIP demonstrate that individuals used for training can be identified with very high accuracy.

Inference Attack

A Typology for Exploring the Mitigation of Shortcut Behavior

3 code implementations4 Mar 2022 Felix Friedrich, Wolfgang Stammer, Patrick Schramowski, Kristian Kersting

In addition, we discuss existing and introduce novel measures and benchmarks for evaluating the overall abilities of a XIL method.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.