Search Results for author: Zeynep Akata

Found 130 papers, 81 papers with code

Context-Aware Multimodal Pretraining

no code implementations22 Nov 2024 Karsten Roth, Zeynep Akata, Dima Damen, Ivana Balažević, Olivier J. Hénaff

Large-scale multimodal representation learning successfully optimizes for zero-shot transfer at test time.

Contrastive Learning Representation Learning +1

Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences

no code implementations27 Oct 2024 Shuchen Wu, Mirko Thalmann, Peter Dayan, Zeynep Akata, Eric Schulz

In contrast, large language models (LLMs) struggle to transfer abstract variables as effectively as humans.

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)

no code implementations25 Oct 2024 Leander Girrbach, Yiran Huang, Stephan Alaniz, Trevor Darrell, Zeynep Akata

Pre-trained large language models (LLMs) have been reliably integrated with visual input for multimodal tasks.

Attribute Image to text

Scalable Ranked Preference Optimization for Text-to-Image Generation

no code implementations23 Oct 2024 Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata, Sergey Tulyakov, Jian Ren, Anil Kag

In this work, we investigate a scalable approach for collecting large-scale and fully synthetic datasets for DPO training.

Text-to-Image Generation

A Practitioner's Guide to Continual Multimodal Pretraining

1 code implementation26 Aug 2024 Karsten Roth, Vishaal Udandarao, Sebastian Dziadzio, Ameya Prabhu, Mehdi Cherti, Oriol Vinyals, Olivier Hénaff, Samuel Albanie, Matthias Bethge, Zeynep Akata

In this work, we complement current perspectives on continual pretraining through a research test bed as well as provide comprehensive guidance for effective continual model updates in such scenarios.

Continual Learning Continual Pretraining +1

Geometry Fidelity for Spherical Images

no code implementations25 Jul 2024 Anders Christensen, Nooshin Mojab, Khushman Patel, Karan Ahuja, Zeynep Akata, Ole Winther, Mar Gonzalez-Franco, Andrea Colaco

Spherical or omni-directional images offer an immersive visual format appealing to a wide range of computer vision applications.

EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval

1 code implementation23 Jul 2024 Thomas Hummel, Shyamgopal Karthik, Mariana-Iuliana Georgescu, Zeynep Akata

In this work, we introduce EgoCVR, a new evaluation benchmark for fine-grained Composed Video Retrieval using large-scale egocentric video datasets.

Re-Ranking Retrieval +2

DataDream: Few-shot Guided Dataset Generation

1 code implementation15 Jul 2024 Jae Myung Kim, Jessica Bader, Stephan Alaniz, Cordelia Schmid, Zeynep Akata

While text-to-image diffusion models have been shown to achieve state-of-the-art results in image synthesis, they have yet to prove their effectiveness in downstream applications.

Classification Image Classification +1

Disentangled Representation Learning with the Gromov-Monge Gap

no code implementations10 Jul 2024 Théo Uscidda, Luca Eyring, Karsten Roth, Fabian Theis, Zeynep Akata, Marco Cuturi

However, matching the prior while preserving geometric features is challenging, as a mapping that fully preserves these features while aligning the data distribution with the prior does not exist in general.

Decoder Disentanglement +1

SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research

no code implementations3 Jul 2024 Meghal Dani, Muthu Jeyanthi Prakash, Zeynep Akata, Stefanie Liebe

In summary, our work provides the first extensive benchmark comparing current SOTA LLMs in the medical domain of epilepsy and highlights their ability to leverage unstructured texts from patients' medical history to aid diagnostic processes in health care.

Prompt Engineering Question Answering

Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models

no code implementations13 Jun 2024 Lukas Thede, Karsten Roth, Olivier J. Hénaff, Matthias Bethge, Zeynep Akata

(2) Indeed, we show how most often, P-RFCL techniques can be matched by a simple and lightweight PEFT baseline.

Continual Learning

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

1 code implementation6 Jun 2024 Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata

Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-$\alpha$, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time.

ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

1 code implementation30 May 2024 Massimo Bini, Karsten Roth, Zeynep Akata, Anna Khoreva

Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability.

Image Generation

Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models

1 code implementation2 May 2024 Nishad Singhi, Jae Myung Kim, Karsten Roth, Zeynep Akata

In this paper, we find that this is noticeably driven by an independent treatment of concepts during intervention, wherein a change of one concept does not influence the use of other ones in the model's final decision.

Image Classification

Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

no code implementations21 Feb 2024 Adrian Höhl, Ivica Obadic, Miguel Ángel Fernández Torres, Hiba Najjar, Dario Oliveira, Zeynep Akata, Andreas Dengel, Xiao Xiang Zhu

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing.

Zero-shot audio captioning with audio-language model guidance and audio context keywords

1 code implementation14 Nov 2023 Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata

In particular, our framework exploits a pre-trained large language model (LLM) for generating the text which is guided by a pre-trained audio-language model to produce captions that describe the audio content.

Descriptive Image Captioning +5

Vision-by-Language for Training-Free Compositional Image Retrieval

1 code implementation13 Oct 2023 Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

Finally, we show that CIReVL makes CIR human-understandable by composing image and text in a modular fashion in the language domain, thereby making it intervenable, allowing to post-hoc re-align failure cases.

Image Retrieval Retrieval +1

Video-adverb retrieval with compositional adverb-action embeddings

1 code implementation26 Sep 2023 Thomas Hummel, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

We propose a framework for video-to-adverb retrieval (and vice versa) that aligns video embeddings with their matching compositional adverb-action text embedding in a joint embedding space.

Triplet Video-Adverb Retrieval (Unseen Compositions)

Text-to-feature diffusion for audio-visual few-shot learning

1 code implementation7 Sep 2023 Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

Training deep learning models for video classification from audio-visual data commonly requires immense amounts of labeled training data collected via a costly process.

Classification Few-Shot Learning +1

Iterative Superquadric Recomposition of 3D Objects from Multiple Views

1 code implementation ICCV 2023 Stephan Alaniz, Massimiliano Mancini, Zeynep Akata

We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views without training a model that uses 3D supervision.

Inductive Bias Object

DeViL: Decoding Vision features into Language

1 code implementation4 Sep 2023 Meghal Dani, Isabel Rio-Torto, Stephan Alaniz, Zeynep Akata

We demonstrate that DeViL generates textual descriptions relevant to the image content on CC3M surpassing previous lightweight captioning models and attribution maps uncovering the learned concepts of the vision backbone.

Decision Making Language Modelling

Image-free Classifier Injection for Zero-Shot Classification

1 code implementation ICCV 2023 Anders Christensen, Massimiliano Mancini, A. Sophia Koepke, Ole Winther, Zeynep Akata

We achieve this with our proposed Image-free Classifier Injection with Semantics (ICIS) that injects classifiers for new, unseen classes into pre-trained classification models in a post-hoc fashion without relying on image data.

Classification Decoder +2

Addressing caveats of neural persistence with deep graph persistence

1 code implementation20 Jul 2023 Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke

Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights.

Topological Data Analysis

ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

1 code implementation ICCV 2023 Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata

We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing.

Active Learning Model Selection +1

Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

1 code implementation ICCV 2023 Karsten Roth, Jae Myung Kim, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata

The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3.

Classification Language Modelling +1

USIM-DAL: Uncertainty-aware Statistical Image Modeling-based Dense Active Learning for Super-resolution

no code implementations27 May 2023 Vikrant Rangnekar, Uddeshya Upadhyay, Zeynep Akata, Biplab Banerjee

Dense regression is a widely used approach in computer vision for tasks such as image super-resolution, enhancement, depth estimation, etc.

Active Learning Depth Estimation +3

In-Context Impersonation Reveals Large Language Models' Strengths and Biases

1 code implementation NeurIPS 2023 Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, Zeynep Akata

These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases.

If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

1 code implementation22 May 2023 Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

Despite their impressive capabilities, diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt, where generated images may not contain all the mentioned objects, attributes or relations.

Text-to-Image Generation

Inducing anxiety in large language models can induce bias

no code implementations21 Apr 2023 Julian Coda-Forno, Kristin Witte, Akshay K. Jagadish, Marcel Binz, Zeynep Akata, Eric Schulz

Anxiety-induction not only influences LLMs' scores on an anxiety questionnaire but also influences their behavior in a previously-established benchmark measuring biases such as racism and ageism.

Decision Making Prompt Engineering

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

no code implementations6 Apr 2023 Jae Myung Kim, A. Sophia Koepke, Cordelia Schmid, Zeynep Akata

In this work, we introduce ODmAP@k, an object decorrelation metric that measures a model's robustness to spurious correlations in the training data.

Cross-Modal Retrieval Image-text Retrieval +2

Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification

2 code implementations CVPR 2023 Youngwook Kim, Jae Myung Kim, Jieun Jeong, Cordelia Schmid, Zeynep Akata, Jungwoo Lee

Based on these findings, we propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.

Classification Multi-Label Classification

Machine Psychology

no code implementations24 Mar 2023 Thilo Hagendorff, Ishita Dasgupta, Marcel Binz, Stephanie C. Y. Chan, Andrew Lampinen, Jane X. Wang, Zeynep Akata, Eric Schulz

Large language models (LLMs) show increasingly advanced emergent capabilities and are being incorporated across various societal domains.

Information Retrieval Retrieval

Likelihood Annealing: Fast Calibrated Uncertainty for Regression

no code implementations21 Feb 2023 Uddeshya Upadhyay, Jae Myung Kim, Cordelia Schmidt, Bernhard Schölkopf, Zeynep Akata

Recent advances in deep learning have shown that uncertainty estimation is becoming increasingly important in applications such as medical imaging, natural language processing, and autonomous systems.

Denoising Image Super-Resolution +2

Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

no code implementations15 Dec 2022 Anurag Das, Yongqin Xian, Yang He, Zeynep Akata, Bernt Schiele

For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets.

Data Augmentation Diversity +2

Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

1 code implementation23 Nov 2022 Yuchen Ma, Yanbei Chen, Zeynep Akata

In this work, we formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network by a novel approach named Embedding Graph Alignment.

Knowledge Distillation Representation Learning +1

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

no code implementations6 Nov 2022 Zafir Stojanovski, Karsten Roth, Zeynep Akata

Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts.

Continual Learning

Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval

1 code implementation19 Oct 2022 Abhra Chaudhuri, Massimiliano Mancini, Yanbei Chen, Zeynep Akata, Anjan Dutta

Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information.

Cross-Modal Retrieval Knowledge Distillation +3

Disentanglement of Correlated Factors via Hausdorff Factorized Support

1 code implementation13 Oct 2022 Karsten Roth, Mark Ibrahim, Zeynep Akata, Pascal Vincent, Diane Bouchacourt

We show that the use of HFS consistently facilitates disentanglement and recovery of ground-truth factors across a variety of correlation settings and benchmarks, even under severe training correlations and correlation shifts, with in parts over $+60\%$ in relative improvement over existing disentanglement methods.

Disentanglement

Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

1 code implementation5 Oct 2022 Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta

Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object.

Semantic Image Synthesis with Semantically Coupled VQ-Model

no code implementations6 Sep 2022 Stephan Alaniz, Thomas Hummel, Zeynep Akata

Semantic image synthesis enables control over unconditional image generation by allowing guidance on what is being generated.

Image Generation Unconditional Image Generation

Semi-Supervised and Unsupervised Deep Visual Learning: A Survey

no code implementations24 Aug 2022 Yanbei Chen, Massimiliano Mancini, Xiatian Zhu, Zeynep Akata

Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data.

Survey

Abstracting Sketches through Simple Primitives

1 code implementation27 Jul 2022 Stephan Alaniz, Massimiliano Mancini, Anjan Dutta, Diego Marcos, Zeynep Akata

Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives under the influence of a budget.

Retrieval Sketch-Based Image Retrieval +1

Temporal and cross-modal attention for audio-visual zero-shot learning

2 code implementations20 Jul 2022 Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

We show that our proposed framework that ingests temporal features yields state-of-the-art performance on the \ucf, \vgg, and \activity benchmarks for (generalised) zero-shot learning.

GZSL Video Classification Video Classification

BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks

1 code implementation14 Jul 2022 Uddeshya Upadhyay, Shyamgopal Karthik, Yanbei Chen, Massimiliano Mancini, Zeynep Akata

Moreover, many of the high-performing deep learning models that are already trained and deployed are non-Bayesian in nature and do not provide uncertainty estimates.

Autonomous Driving Deblurring +3

A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning

1 code implementation8 Jul 2022 Michael Kirchhof, Karsten Roth, Zeynep Akata, Enkelejda Kasneci

We model images as directional von Mises-Fisher (vMF) distributions on the hypersphere that can reflect image-intrinsic uncertainties.

Metric Learning

The Manifold Hypothesis for Gradient-Based Explanations

1 code implementation15 Jun 2022 Sebastian Bordt, Uddeshya Upadhyay, Zeynep Akata, Ulrike Von Luxburg

We propose a criterion: the feature attributions need to be aligned with the tangent space of the data manifold.

Diabetic Retinopathy Detection

Compositional Mixture Representations for Vision and Text

no code implementations13 Jun 2022 Stephan Alaniz, Marco Federici, Zeynep Akata

Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning.

object-detection Representation Learning +1

Large Loss Matters in Weakly Supervised Multi-Label Classification

1 code implementation CVPR 2022 Youngwook Kim, Jae Myung Kim, Zeynep Akata, Jungwoo Lee

In this work, we first regard unobserved labels as negative labels, casting the WSML task into noisy multi-label classification.

Classification Memorization +1

KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning

1 code implementation CVPR 2022 Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata

The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images, given only a subset of them during training and no prior on the unseen compositions.

Compositional Zero-Shot Learning Missing Labels

Attention Consistency on Visual Corruptions for Single-Source Domain Generalization

1 code implementation27 Apr 2022 Ilke Cugu, Massimiliano Mancini, Yanbei Chen, Zeynep Akata

Generalizing visual recognition models trained on a single distribution to unseen input distributions (i. e. domains) requires making them robust to superfluous correlations in the training set.

Single-Source Domain Generalization

Probabilistic Compositional Embeddings for Multimodal Image Retrieval

1 code implementation12 Apr 2022 Andrei Neculai, Yanbei Chen, Zeynep Akata

Without bells and whistles, we show that our probabilistic model formulation significantly outperforms existing related methods on multimodal image retrieval while generalizing well to query with different amounts of inputs given in arbitrary visual and (or) textual modalities.

Image Retrieval Retrieval

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

1 code implementation5 Apr 2022 Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata

We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset.

Explanation Generation Question Answering +3

Attribute Prototype Network for Any-Shot Learning

no code implementations4 Apr 2022 Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

While a visual-semantic embedding layer learns global features, local features are learned through an attribute prototype network that simultaneously regresses and decorrelates attributes from intermediate features.

Attribute Few-Shot Image Classification +2

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

1 code implementation CVPR 2022 Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity, and further imposes their class discrimination and semantic relatedness.

Transfer Learning Word Embeddings +1

Non-isotropy Regularization for Proxy-based Deep Metric Learning

1 code implementation CVPR 2022 Karsten Roth, Oriol Vinyals, Zeynep Akata

Deep Metric Learning (DML) aims to learn representation spaces on which semantic relations can simply be expressed through predefined distance metrics.

Ranked #11 on Metric Learning on CUB-200-2011 (using extra training data)

Metric Learning

Integrating Language Guidance into Vision-based Deep Metric Learning

1 code implementation CVPR 2022 Karsten Roth, Oriol Vinyals, Zeynep Akata

This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes, impacting the generalizability of the learned metric space.

Ranked #8 on Metric Learning on CARS196 (using extra training data)

Metric Learning

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

1 code implementation CVPR 2022 Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata

Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes.

GZSL Video Classification ZSL Video Classification

Audio Retrieval with Natural Language Queries: A Benchmark Study

1 code implementation17 Dec 2021 A. Sophia Koepke, Andreea-Maria Oncescu, João F. Henriques, Zeynep Akata, Samuel Albanie

Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.

AudioCaps Audio captioning +4

Human Attention in Fine-grained Classification

1 code implementation2 Nov 2021 Yao Rong, Wenjia Xu, Zeynep Akata, Enkelejda Kasneci

The way humans attend to, process and classify a given image has the potential to vastly benefit the performance of deep learning models.

Classification Decision Making +1

Robustness via Uncertainty-aware Cycle Consistency

1 code implementation NeurIPS 2021 Uddeshya Upadhyay, Yanbei Chen, Zeynep Akata

Unpaired image-to-image translation refers to learning inter-image-domain mapping without corresponding image pairs.

Autonomous Driving Image-to-Image Translation +1

Conditional De-Identification of 3D Magnetic Resonance Images

no code implementations18 Oct 2021 Lennart Alexander Van der Goten, Tobias Hepp, Zeynep Akata, Kevin Smith

Solutions have been developed to de-identify diagnostic scans by obfuscating or removing parts of the face.

De-identification

Variational Perturbations for Visual Feature Attribution

no code implementations29 Sep 2021 Jae Myung Kim, Eunji Kim, Sungroh Yoon, Jungwoo Lee, Cordelia Schmid, Zeynep Akata

Explaining a complex black-box system in a post-hoc manner is important to understand its predictions.

Fine-Grained Zero-Shot Learning with DNA as Side Information

1 code implementation NeurIPS 2021 Sarkhan Badirli, Zeynep Akata, George Mohler, Christine Picard, Murat Dundar

Fine-grained zero-shot learning task requires some form of side-information to transfer discriminative information from seen to unseen classes.

Zero-Shot Learning

Concurrent Discrimination and Alignment for Self-Supervised Feature Learning

no code implementations19 Aug 2021 Anjan Dutta, Massimiliano Mancini, Zeynep Akata

Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together, but ignore the fact how to jointly and principally define which features to be repelled and which ones to be attracted.

Self-Supervised Learning Semantic Segmentation +1

Uncertainty-Guided Progressive GANs for Medical Image Translation

1 code implementation29 Jun 2021 Uddeshya Upadhyay, Yanbei Chen, Tobias Hepp, Sergios Gatidis, Zeynep Akata

However, the state-of-the-art GAN-based frameworks do not estimate the uncertainty in the predictions made by the network that is essential for making informed medical decisions and subsequent revision by medical experts and has recently been shown to improve the performance and interpretability of the model.

Denoising Image-to-Image Translation +2

Keep CALM and Improve Visual Feature Attribution

1 code implementation ICCV 2021 Jae Myung Kim, Junsuk Choe, Zeynep Akata, Seong Joon Oh

The class activation mapping, or CAM, has been the cornerstone of feature attribution methods for multiple vision tasks.

Weakly-Supervised Object Localization

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

2 code implementations ICCV 2021 Maxime Kayser, Oana-Maria Camburu, Leonard Salewski, Cornelius Emde, Virginie Do, Zeynep Akata, Thomas Lukasiewicz

e-ViL is a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks.

Language Modelling Text Generation

Learning Graph Embeddings for Open World Compositional Zero-Shot Learning

2 code implementations3 May 2021 Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata

In this work, we overcome this assumption operating on the open world setting, where no limit is imposed on the compositional space at test time, and the search space contains a large number of unseen compositions.

Compositional Zero-Shot Learning

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

1 code implementation CVPR 2021 Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, Zeynep Akata

Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.

Audio Tagging audio-visual learning +5

A Closer Look at Self-training for Zero-Label Semantic Segmentation

1 code implementation21 Apr 2021 Giuseppe Pastore, Fabio Cermelli, Yongqin Xian, Massimiliano Mancini, Zeynep Akata, Barbara Caputo

Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation.

Segmentation Semantic Segmentation +1

Uncertainty-aware Generalized Adaptive CycleGAN

1 code implementation23 Feb 2021 Uddeshya Upadhyay, Yanbei Chen, Zeynep Akata

Unpaired image-to-image translation refers to learning inter-image-domain mapping in an unsupervised manner.

Image Denoising Image-to-Image Translation +1

Learning Graph Embeddings for Compositional Zero-shot Learning

1 code implementation CVPR 2021 Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata

In compositional zero-shot learning, the goal is to recognize unseen compositions (e. g. old dog) of observed visual primitives states (e. g. old, cute) and objects (e. g. car, dog) in the training set.

Compositional Zero-Shot Learning Graph Embedding +1

Open World Compositional Zero-Shot Learning

2 code implementations CVPR 2021 Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata

After estimating the feasibility score of each composition, we use these scores to either directly mask the output space or as a margin for the cosine similarity between visual features and compositional embeddings during training.

Compositional Zero-Shot Learning

Adversarial Privacy Preservation in MRI Scans of the Brain

no code implementations1 Jan 2021 Lennart Alexander Van der Goten, Tobias Hepp, Zeynep Akata, Kevin Smith

De-identification of magnetic resonance imagery (MRI) is intrinsically difficult since, even with all metadata removed, a person's face can easily be rendered and matched against a database.

De-identification

Prototype-based Incremental Few-Shot Semantic Segmentation

1 code implementation30 Nov 2020 Fabio Cermelli, Massimiliano Mancini, Yongqin Xian, Zeynep Akata, Barbara Caputo

Semantic segmentation models have two fundamental weaknesses: i) they require large training sets with costly pixel-level annotations, and ii) they have a static output space, constrained to the classes of the training set.

Few-Shot Semantic Segmentation Incremental Learning +3

Attribute Prototype Network for Zero-Shot Learning

no code implementations NeurIPS 2020 Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

As an additional benefit, our model points to the visual evidence of the attributes in an image, e. g. for the CUB dataset, confirming the improved attribute localization ability of our image representation.

Attribute Representation Learning +1

Towards Recognizing Unseen Categories in Unseen Domains

1 code implementation ECCV 2020 Massimiliano Mancini, Zeynep Akata, Elisa Ricci, Barbara Caputo

The key idea of CuMix is to simulate the test-time domain and semantic shift using images and features from unseen domains and categories generated by mixing up the multiple source domains and categories available during training.

Domain Generalization Zero-Shot Learning +1

Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets

2 code implementations8 Jul 2020 Junsuk Choe, Seong Joon Oh, Sanghyuk Chun, Seungho Lee, Zeynep Akata, Hyunjung Shim

In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.

Few-Shot Learning Model Selection +1

Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based Image Retrieval

no code implementations20 Jun 2020 Anjan Dutta, Zeynep Akata

Low-shot sketch-based image retrieval is an emerging task in computer vision, allowing to retrieve natural images relevant to hand-drawn sketch queries that are rarely seen during the training phase.

Generative Adversarial Network Retrieval +1

e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations

3 code implementations7 Apr 2020 Virginie Do, Oana-Maria Camburu, Zeynep Akata, Thomas Lukasiewicz

The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning.

Multimodal Reasoning Natural Language Inference

Evaluating Weakly Supervised Object Localization Methods Right

2 code implementations CVPR 2020 Junsuk Choe, Seong Joon Oh, Seungho Lee, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim

In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.

Few-Shot Learning Model Selection +2

Understanding Misclassifications by Attributes

1 code implementation15 Oct 2019 Sadaf Gulshad, Zeynep Akata, Jan Hendrik Metzen, Arnold Smeulders

We study the changes in attributes for clean as well as adversarial images in both standard and adversarially robust networks.

Modeling Conceptual Understanding in Image Reference Games

1 code implementation NeurIPS 2019 Rodolfo Corona, Stephan Alaniz, Zeynep Akata

An agent who interacts with a wide population of other agents needs to be aware that there may be variations in their understanding of the world.

Attribute

Relational Generalized Few-Shot Learning

no code implementations22 Jul 2019 Xiahan Shi, Leonard Salewski, Martin Schiegg, Zeynep Akata, Max Welling

Instead, we consider the extended setup of generalized few-shot learning (GFSL), where the model is required to perform classification on the joint label space consisting of both previously seen and novel classes.

Few-Shot Learning Generalized Few-Shot Learning

Bayesian Zero-Shot Learning

1 code implementation22 Jul 2019 Sarkhan Badirli, Zeynep Akata, Murat Dundar

Object classes that surround us have a natural tendency to emerge at varying levels of abstraction.

Zero-Shot Learning

Interpreting Adversarial Examples with Attributes

1 code implementation17 Apr 2019 Sadaf Gulshad, Jan Hendrik Metzen, Arnold Smeulders, Zeynep Akata

Deep computer vision systems being vulnerable to imperceptible and carefully crafted noise have raised questions regarding the robustness of their decisions.

Attribute General Classification

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

no code implementations CVPR 2019 Yongqin Xian, Saurabh Sharma, Bernt Schiele, Zeynep Akata

When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes.

Data Augmentation Few-Shot Learning +2

Cross-Linked Variational Autoencoders for Generalized Zero-Shot Learning

no code implementations ICLR Workshop LLD 2019 Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

While following the same direction, we also take artificial feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by aligned variational autoencoders, for the purpose of generating latent features to train a softmax classifier.

Few-Shot Learning Generalized Zero-Shot Learning

Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval

1 code implementation CVPR 2019 Anjan Dutta, Zeynep Akata

Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space.

feature selection Retrieval +1

Learning Decision Trees Recurrently Through Communication

no code implementations CVPR 2021 Stephan Alaniz, Diego Marcos, Bernt Schiele, Zeynep Akata

Integrated interpretability without sacrificing the prediction accuracy of decision making algorithms has the potential of greatly improving their value to the user.

Decision Making Image Classification

Visual Rationalizations in Deep Reinforcement Learning for Atari Games

no code implementations1 Feb 2019 Laurens Weitkamp, Elise van der Pol, Zeynep Akata

Due to the capability of deep learning to perform well in high dimensional problems, deep reinforcement learning agents perform well in challenging tasks such as Atari 2600 games.

Atari Games Decision Making +4

Manipulating Attributes of Natural Scenes via Hallucination

no code implementations22 Aug 2018 Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem

In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene.

Hallucination Style Transfer +1

Textual Explanations for Self-Driving Vehicles

2 code implementations ECCV 2018 Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata

Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments.

Grounding Visual Explanations

no code implementations ECCV 2018 Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.

AI Agent General Classification +1

Generating Counterfactual Explanations with Natural Language

no code implementations26 Jun 2018 Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

We call such textual explanations counterfactual explanations, and propose an intuitive method to generate counterfactual explanations by inspecting which evidence in an input is missing, but might contribute to a different classification decision if present in the image.

AI Agent Classification +3

Primal-Dual Wasserstein GAN

no code implementations24 May 2018 Mevlana Gemici, Zeynep Akata, Max Welling

We introduce Primal-Dual Wasserstein GAN, a new learning algorithm for building latent variable models of the data distribution based on the primal and the dual formulations of the optimal transport (OT) problem.

Decoder

Feature Generating Networks for Zero-Shot Learning

4 code implementations CVPR 2018 Yongqin Xian, Tobias Lorenz, Bernt Schiele, Zeynep Akata

Suffering from the extreme training data imbalance between seen and unseen classes, most of existing state-of-the-art approaches fail to achieve satisfactory results for the challenging generalized zero-shot learning task.

Generalized Zero-Shot Learning Generative Adversarial Network

Grounding Visual Explanations (Extended Abstract)

no code implementations17 Nov 2017 Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image.

Attribute

Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly

10 code implementations3 Jul 2017 Yongqin Xian, Christoph H. Lampert, Bernt Schiele, Zeynep Akata

Due to the importance of zero-shot learning, i. e. classifying images where there is a lack of labeled training data, the number of proposed approaches has recently increased steadily.

Zero-Shot Learning

Zero-Shot Learning -- The Good, the Bad and the Ugly

1 code implementation CVPR 2017 Yongqin Xian, Bernt Schiele, Zeynep Akata

Due to the importance of zero-shot learning, the number of proposed approaches has increased steadily recently.

Zero-Shot Learning

Exploiting saliency for object segmentation from image level labels

no code implementations CVPR 2017 Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, Bernt Schiele

We show how to combine both information sources in order to recover 80% of the fully supervised performance - which is the new state of the art in weakly supervised training for pixel-wise semantic labelling.

Object Semantic Segmentation

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

no code implementations14 Dec 2016 Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions.

Decision Making Question Answering +2

Gaze Embeddings for Zero-Shot Image Classification

no code implementations CVPR 2017 Nour Karessli, Zeynep Akata, Bernt Schiele, Andreas Bulling

Zero-shot image classification using auxiliary information, such as attributes describing discriminative object properties, requires time-consuming annotation by domain experts.

Classification Fine-Grained Image Classification +2

Learning What and Where to Draw

no code implementations NeurIPS 2016 Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee

Generative Adversarial Networks (GANs) have recently demonstrated the capability to synthesize compelling real-world images, such as room interiors, album covers, manga, faces, birds, and flowers.

Ranked #14 on Text-to-Image Generation on CUB (using extra training data)

Text-to-Image Generation

Learning Deep Representations of Fine-grained Visual Descriptions

9 code implementations CVPR 2016 Scott Reed, Zeynep Akata, Bernt Schiele, Honglak Lee

State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information.

Attribute Image Retrieval +2

Generative Adversarial Text to Image Synthesis

39 code implementations17 May 2016 Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal.

Adversarial Text Text-to-Image Generation

Multi-Cue Zero-Shot Learning with Strong Supervision

no code implementations CVPR 2016 Zeynep Akata, Mateusz Malinowski, Mario Fritz, Bernt Schiele

A promising research direction is zero-shot learning, which does not require any training data to recognize new classes, but rather relies on some form of auxiliary information describing the new classes.

Attribute Retrieval +1

Latent Embeddings for Zero-shot Classification

no code implementations CVPR 2016 Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, Bernt Schiele

We train the model with a ranking based objective function which penalizes incorrect rankings of the true class for a given image.

Classification General Classification +1

Generating Visual Explanations

no code implementations28 Mar 2016 Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, Trevor Darrell

Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself.

General Classification Reinforcement Learning +2

Label-Embedding for Image Classification

2 code implementations30 Mar 2015 Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid

Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce.

Attribute Classification +4

Label-Embedding for Attribute-Based Classification

no code implementations CVPR 2013 Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid

The label embedding framework offers other advantages such as the ability to leverage alternative sources of information in addition to attributes (e. g. class hierarchies) or to transition smoothly from zero-shot learning to learning with large quantities of data.

Attribute Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.