Search Results for author: Zeynep Akata

Found 115 papers, 74 papers with code

Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models

2 code implementations • 9 Apr 2024 • David Kurzendörfer, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

However, existing benchmarks predate the popularization of large multi-modal models, such as CLIP and CLAP.

Audio Classification Generalized Zero-Shot Learning

152

Paper
Code

Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

no code implementations • 21 Feb 2024 • Adrian Höhl, Ivica Obadic, Miguel Ángel Fernández Torres, Hiba Najjar, Dario Oliveira, Zeynep Akata, Andreas Dengel, Xiao Xiang Zhu

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing.

Paper
Add Code

How should the advent of large language models affect the practice of science?

no code implementations • 5 Dec 2023 • Marcel Binz, Stephan Alaniz, Adina Roskies, Balazs Aczel, Carl T. Bergstrom, Colin Allen, Daniel Schad, Dirk Wulff, Jevin D. West, Qiong Zhang, Richard M. Shiffrin, Samuel J. Gershman, Ven Popov, Emily M. Bender, Marco Marelli, Matthew M. Botvinick, Zeynep Akata, Eric Schulz

For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate.

Paper
Add Code

Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation

1 code implementation • 25 Nov 2023 • Luca Eyring, Dominik Klein, Théo Uscidda, Giovanni Palla, Niki Kilbertus, Zeynep Akata, Fabian Theis

We hence establish UOT-FM as a principled method for unpaired image translation.

Translation

Paper
Code

Zero-shot audio captioning with audio-language model guidance and audio context keywords

1 code implementation • 14 Nov 2023 • Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata

In particular, our framework exploits a pre-trained large language model (LLM) for generating the text which is guided by a pre-trained audio-language model to produce captions that describe the audio content.

Ranked #1 on Zero-shot Audio Captioning on Clotho

Descriptive Image Captioning +5

Paper
Code

Zero-shot Translation of Attention Patterns in VQA Models to Natural Language

1 code implementation • 8 Nov 2023 • Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata

Converting a model's internals to text can yield human-understandable insights about the model.

Image Captioning Language Modelling +3

Paper
Code

Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model

1 code implementation • 26 Oct 2023 • Karsten Roth, Lukas Thede, Almut Sophia Koepke, Oriol Vinyals, Olivier Hénaff, Zeynep Akata

Training deep networks requires various design decisions regarding for instance their architecture, data augmentation, or optimization.

Data Augmentation General Knowledge +2

Paper
Code

Vision-by-Language for Training-Free Compositional Image Retrieval

1 code implementation • 13 Oct 2023 • Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

Finally, we show that CIReVL makes CIR human-understandable by composing image and text in a modular fashion in the language domain, thereby making it intervenable, allowing to post-hoc re-align failure cases.

Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on CIRCO

Image Retrieval Retrieval +1

Paper
Code

Video-adverb retrieval with compositional adverb-action embeddings

1 code implementation • 26 Sep 2023 • Thomas Hummel, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

We propose a framework for video-to-adverb retrieval (and vice versa) that aligns video embeddings with their matching compositional adverb-action text embedding in a joint embedding space.

Ranked #1 on Video-Adverb Retrieval (Unseen Compositions) on MSR-VTT Adverbs

Video-Adverb Retrieval (Unseen Compositions)

Paper
Code

Text-to-feature diffusion for audio-visual few-shot learning

1 code implementation • 7 Sep 2023 • Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

Training deep learning models for video classification from audio-visual data commonly requires immense amounts of labeled training data collected via a costly process.

Classification Few-Shot Learning +1

Paper
Code

PDiscoNet: Semantically consistent part discovery for fine-grained recognition

1 code implementation • ICCV 2023 • Robert van der Klis, Stephan Alaniz, Massimiliano Mancini, Cassio F. Dantas, Dino Ienco, Zeynep Akata, Diego Marcos

Fine-grained classification often requires recognizing specific object parts, such as beak shape and wing patterns for birds.

Classification

Paper
Code

Iterative Superquadric Recomposition of 3D Objects from Multiple Views

1 code implementation • ICCV 2023 • Stephan Alaniz, Massimiliano Mancini, Zeynep Akata

We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views without training a model that uses 3D supervision.

Inductive Bias Object

Paper
Code

DeViL: Decoding Vision features into Language

1 code implementation • 4 Sep 2023 • Meghal Dani, Isabel Rio-Torto, Stephan Alaniz, Zeynep Akata

We demonstrate that DeViL generates textual descriptions relevant to the image content on CC3M surpassing previous lightweight captioning models and attribution maps uncovering the learned concepts of the vision backbone.

Decision Making Language Modelling

Paper
Code

Image-free Classifier Injection for Zero-Shot Classification

1 code implementation • ICCV 2023 • Anders Christensen, Massimiliano Mancini, A. Sophia Koepke, Ole Winther, Zeynep Akata

We achieve this with our proposed Image-free Classifier Injection with Semantics (ICIS) that injects classifiers for new, unseen classes into pre-trained classification models in a post-hoc fashion without relying on image data.

Classification Image Classification +1

Paper
Code

Addressing caveats of neural persistence with deep graph persistence

1 code implementation • 20 Jul 2023 • Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke

Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights.

Topological Data Analysis

Paper
Code

ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models

1 code implementation • ICCV 2023 • Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata

We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing.

Active Learning Model Selection +1

Paper
Code

Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

1 code implementation • ICCV 2023 • Karsten Roth, Jae Myung Kim, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata

The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3.

Classification Language Modelling +1

Paper
Code

USIM-DAL: Uncertainty-aware Statistical Image Modeling-based Dense Active Learning for Super-resolution

no code implementations • 27 May 2023 • Vikrant Rangnekar, Uddeshya Upadhyay, Zeynep Akata, Biplab Banerjee

Dense regression is a widely used approach in computer vision for tasks such as image super-resolution, enhancement, depth estimation, etc.

Active Learning Depth Estimation +3

Paper
Add Code

In-Context Impersonation Reveals Large Language Models' Strengths and Biases

1 code implementation • NeurIPS 2023 • Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, Zeynep Akata

These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases.

Paper
Code

If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection

1 code implementation • 22 May 2023 • Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini, Zeynep Akata

Despite their impressive capabilities, diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt, where generated images may not contain all the mentioned objects, attributes or relations.

Text-to-Image Generation

Paper
Code

Inducing anxiety in large language models increases exploration and bias

no code implementations • 21 Apr 2023 • Julian Coda-Forno, Kristin Witte, Akshay K. Jagadish, Marcel Binz, Zeynep Akata, Eric Schulz

We focus on the Generative Pre-Trained Transformer 3. 5 and subject it to tasks commonly studied in psychiatry.

Decision Making Prompt Engineering

Paper
Add Code

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

no code implementations • 6 Apr 2023 • Jae Myung Kim, A. Sophia Koepke, Cordelia Schmid, Zeynep Akata

In this work, we introduce ODmAP@k, an object decorrelation metric that measures a model's robustness to spurious correlations in the training data.

Cross-Modal Retrieval Object +2

Paper
Add Code

Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification

2 code implementations • CVPR 2023 • Youngwook Kim, Jae Myung Kim, Jieun Jeong, Cordelia Schmid, Zeynep Akata, Jungwoo Lee

Based on these findings, we propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.

Classification Multi-Label Classification

Paper
Code

Likelihood Annealing: Fast Calibrated Uncertainty for Regression

no code implementations • 21 Feb 2023 • Uddeshya Upadhyay, Jae Myung Kim, Cordelia Schmidt, Bernhard Schölkopf, Zeynep Akata

Recent advances in deep learning have shown that uncertainty estimation is becoming increasingly important in applications such as medical imaging, natural language processing, and autonomous systems.

Denoising Image Super-Resolution +2

Paper
Add Code

Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation

no code implementations • 15 Dec 2022 • Anurag Das, Yongqin Xian, Yang He, Zeynep Akata, Bernt Schiele

For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets.

Data Augmentation Scene Segmentation +1

Paper
Add Code

Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

1 code implementation • 23 Nov 2022 • Yuchen Ma, Yanbei Chen, Zeynep Akata

In this work, we formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network by a novel approach named Embedding Graph Alignment.

Knowledge Distillation Representation Learning +1

Paper
Code

Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning

no code implementations • 6 Nov 2022 • Zafir Stojanovski, Karsten Roth, Zeynep Akata

Large pre-trained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts.

Continual Learning

Paper
Add Code

PlanT: Explainable Planning Transformers via Object-Level Representations

1 code implementation • 25 Oct 2022 • Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, Andreas Geiger

Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene.

Ranked #6 on CARLA longest6 on CARLA

CARLA longest6 Imitation Learning +1

196

Paper
Code

Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval

1 code implementation • 19 Oct 2022 • Abhra Chaudhuri, Massimiliano Mancini, Yanbei Chen, Zeynep Akata, Anjan Dutta

Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information.

Cross-Modal Retrieval Knowledge Distillation +3

Paper
Code

Disentanglement of Correlated Factors via Hausdorff Factorized Support

1 code implementation • 13 Oct 2022 • Karsten Roth, Mark Ibrahim, Zeynep Akata, Pascal Vincent, Diane Bouchacourt

We show that the use of HFS consistently facilitates disentanglement and recovery of ground-truth factors across a variety of correlation settings and benchmarks, even under severe training correlations and correlation shifts, with in parts over $+60\%$ in relative improvement over existing disentanglement methods.

Disentanglement

Paper
Code

Relational Proxies: Emergent Relationships as Fine-Grained Discriminators

1 code implementation • 5 Oct 2022 • Abhra Chaudhuri, Massimiliano Mancini, Zeynep Akata, Anjan Dutta

Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object.

Paper
Code

Semantic Image Synthesis with Semantically Coupled VQ-Model

no code implementations • 6 Sep 2022 • Stephan Alaniz, Thomas Hummel, Zeynep Akata

Semantic image synthesis enables control over unconditional image generation by allowing guidance on what is being generated.

Image Generation Unconditional Image Generation

Paper
Add Code

Semi-Supervised and Unsupervised Deep Visual Learning: A Survey

no code implementations • 24 Aug 2022 • Yanbei Chen, Massimiliano Mancini, Xiatian Zhu, Zeynep Akata

Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data.

Paper
Add Code

Abstracting Sketches through Simple Primitives

1 code implementation • 27 Jul 2022 • Stephan Alaniz, Massimiliano Mancini, Anjan Dutta, Diego Marcos, Zeynep Akata

Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives under the influence of a budget.

Retrieval Sketch-Based Image Retrieval +1

Paper
Code

Temporal and cross-modal attention for audio-visual zero-shot learning

2 code implementations • 20 Jul 2022 • Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

We show that our proposed framework that ingests temporal features yields state-of-the-art performance on the \ucf, \vgg, and \activity benchmarks for (generalised) zero-shot learning.

Ranked #2 on GZSL Video Classification on UCF-GZSL(main)

GZSL Video Classification Video Classification

Paper
Code

BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks

1 code implementation • 14 Jul 2022 • Uddeshya Upadhyay, Shyamgopal Karthik, Yanbei Chen, Massimiliano Mancini, Zeynep Akata

Moreover, many of the high-performing deep learning models that are already trained and deployed are non-Bayesian in nature and do not provide uncertainty estimates.

Autonomous Driving Deblurring +2

Paper
Code

A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning

1 code implementation • 8 Jul 2022 • Michael Kirchhof, Karsten Roth, Zeynep Akata, Enkelejda Kasneci

We model images as directional von Mises-Fisher (vMF) distributions on the hypersphere that can reflect image-intrinsic uncertainties.

Metric Learning

Paper
Code

The Manifold Hypothesis for Gradient-Based Explanations

no code implementations • 15 Jun 2022 • Sebastian Bordt, Uddeshya Upadhyay, Zeynep Akata, Ulrike Von Luxburg

We propose a necessary criterion: their feature attributions need to be aligned with the tangent space of the data manifold.

Diabetic Retinopathy Detection

Paper
Add Code

Compositional Mixture Representations for Vision and Text

no code implementations • 13 Jun 2022 • Stephan Alaniz, Marco Federici, Zeynep Akata

Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning.

object-detection Representation Learning +1

Paper
Add Code

Large Loss Matters in Weakly Supervised Multi-Label Classification

1 code implementation • CVPR 2022 • Youngwook Kim, Jae Myung Kim, Zeynep Akata, Jungwoo Lee

In this work, we first regard unobserved labels as negative labels, casting the WSML task into noisy multi-label classification.

Classification Memorization +1

Paper
Code

KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning

1 code implementation • CVPR 2022 • Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata

The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images, given only a subset of them during training and no prior on the unseen compositions.

Compositional Zero-Shot Learning Missing Labels

Paper
Code

Attention Consistency on Visual Corruptions for Single-Source Domain Generalization

1 code implementation • 27 Apr 2022 • Ilke Cugu, Massimiliano Mancini, Yanbei Chen, Zeynep Akata

Generalizing visual recognition models trained on a single distribution to unseen input distributions (i. e. domains) requires making them robust to superfluous correlations in the training set.

Domain Generalization

Paper
Code

Probabilistic Compositional Embeddings for Multimodal Image Retrieval

1 code implementation • 12 Apr 2022 • Andrei Neculai, Yanbei Chen, Zeynep Akata

Without bells and whistles, we show that our probabilistic model formulation significantly outperforms existing related methods on multimodal image retrieval while generalizing well to query with different amounts of inputs given in arbitrary visual and (or) textual modalities.

Image Retrieval Retrieval

Paper
Code

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

1 code implementation • 5 Apr 2022 • Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata

We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset.

Ranked #1 on Explanation Generation on CLEVR-X

Explanation Generation Question Answering +3

Paper
Code

Attribute Prototype Network for Any-Shot Learning

no code implementations • 4 Apr 2022 • Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

While a visual-semantic embedding layer learns global features, local features are learned through an attribute prototype network that simultaneously regresses and decorrelates attributes from intermediate features.

Ranked #5 on GZSL Video Classification on ActivityNet-GZSL(main)

Attribute Few-Shot Image Classification +2

Paper
Add Code

VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning

1 code implementation • CVPR 2022 • Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity, and further imposes their class discrimination and semantic relatedness.

Transfer Learning Word Embeddings +1

Paper
Code

Integrating Language Guidance into Vision-based Deep Metric Learning

1 code implementation • CVPR 2022 • Karsten Roth, Oriol Vinyals, Zeynep Akata

This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes, impacting the generalizability of the learned metric space.

Ranked #8 on Metric Learning on CARS196 (using extra training data)

Metric Learning

Paper
Code

Non-isotropy Regularization for Proxy-based Deep Metric Learning

1 code implementation • CVPR 2022 • Karsten Roth, Oriol Vinyals, Zeynep Akata

Deep Metric Learning (DML) aims to learn representation spaces on which semantic relations can simply be expressed through predefined distance metrics.

Ranked #11 on Metric Learning on CUB-200-2011 (using extra training data)

Metric Learning

Paper
Code

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

1 code implementation • CVPR 2022 • Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata

Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes.

Ranked #1 on ZSL Video Classification on UCF-GZSL (cls)

GZSL Video Classification ZSL Video Classification

Paper
Code

BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR

no code implementations • 17 Jan 2022 • Ushasi Chaudhuri, Ruchika Chavan, Biplab Banerjee, Anjan Dutta, Zeynep Akata

The efficacy of zero-shot sketch-based image retrieval (ZS-SBIR) models is governed by two challenges.

Domain Adaptation Retrieval +1

Paper
Add Code

Audio Retrieval with Natural Language Queries: A Benchmark Study

1 code implementation • 17 Dec 2021 • A. Sophia Koepke, Andreea-Maria Oncescu, João F. Henriques, Zeynep Akata, Samuel Albanie

Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.

Ranked #1 on Audio to Text Retrieval on SoundDescs

AudioCaps Audio captioning +5

Paper
Code

Human Attention in Fine-grained Classification

1 code implementation • 2 Nov 2021 • Yao Rong, Wenjia Xu, Zeynep Akata, Enkelejda Kasneci

The way humans attend to, process and classify a given image has the potential to vastly benefit the performance of deep learning models.

Ranked #41 on Fine-Grained Image Classification on CUB-200-2011

Classification Decision Making +1

Paper
Code

Robustness via Uncertainty-aware Cycle Consistency

1 code implementation • NeurIPS 2021 • Uddeshya Upadhyay, Yanbei Chen, Zeynep Akata

Unpaired image-to-image translation refers to learning inter-image-domain mapping without corresponding image pairs.

Autonomous Driving Image-to-Image Translation +1

Paper
Code

Conditional De-Identification of 3D Magnetic Resonance Images

no code implementations • 18 Oct 2021 • Lennart Alexander Van der Goten, Tobias Hepp, Zeynep Akata, Kevin Smith

Solutions have been developed to de-identify diagnostic scans by obfuscating or removing parts of the face.

De-identification

Paper
Add Code

Variational Perturbations for Visual Feature Attribution

no code implementations • 29 Sep 2021 • Jae Myung Kim, Eunji Kim, Sungroh Yoon, Jungwoo Lee, Cordelia Schmid, Zeynep Akata

Explaining a complex black-box system in a post-hoc manner is important to understand its predictions.

Paper
Add Code

Fine-Grained Zero-Shot Learning with DNA as Side Information

1 code implementation • NeurIPS 2021 • Sarkhan Badirli, Zeynep Akata, George Mohler, Christine Picard, Murat Dundar

Fine-grained zero-shot learning task requires some form of side-information to transfer discriminative information from seen to unseen classes.

Zero-Shot Learning

Paper
Code

Concurrent Discrimination and Alignment for Self-Supervised Feature Learning

no code implementations • 19 Aug 2021 • Anjan Dutta, Massimiliano Mancini, Zeynep Akata

Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together, but ignore the fact how to jointly and principally define which features to be repelled and which ones to be attracted.

Self-Supervised Learning Semantic Segmentation +1

Paper
Add Code

Uncertainty-Guided Progressive GANs for Medical Image Translation

1 code implementation • 29 Jun 2021 • Uddeshya Upadhyay, Yanbei Chen, Tobias Hepp, Sergios Gatidis, Zeynep Akata

However, the state-of-the-art GAN-based frameworks do not estimate the uncertainty in the predictions made by the network that is essential for making informed medical decisions and subsequent revision by medical experts and has recently been shown to improve the performance and interpretability of the model.

Denoising Image-to-Image Translation +2

Paper
Code

Keep CALM and Improve Visual Feature Attribution

1 code implementation • ICCV 2021 • Jae Myung Kim, Junsuk Choe, Zeynep Akata, Seong Joon Oh

The class activation mapping, or CAM, has been the cornerstone of feature attribution methods for multiple vision tasks.

Weakly-Supervised Object Localization

Paper
Code

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

2 code implementations • ICCV 2021 • Maxime Kayser, Oana-Maria Camburu, Leonard Salewski, Cornelius Emde, Virginie Do, Zeynep Akata, Thomas Lukasiewicz

e-ViL is a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks.

Language Modelling Text Generation

Paper
Code

Audio Retrieval with Natural Language Queries

1 code implementation • 5 May 2021 • Andreea-Maria Oncescu, A. Sophia Koepke, João F. Henriques, Zeynep Akata, Samuel Albanie

We consider the task of retrieving audio using free-form natural language queries.

Ranked #1 on Audio/Video to Text Retrieval on AudioCaps

AudioCaps Audio to Text Retrieval +5

Paper
Code

Where and When: Space-Time Attention for Audio-Visual Explanations

no code implementations • 4 May 2021 • Yanbei Chen, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

Recent advances in XAI provide explanations for models trained on still images.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +1

Paper
Add Code

Learning Graph Embeddings for Open World Compositional Zero-Shot Learning

2 code implementations • 3 May 2021 • Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata

In this work, we overcome this assumption operating on the open world setting, where no limit is imposed on the compositional space at test time, and the search space contains a large number of unseen compositions.

Compositional Zero-Shot Learning

107

Paper
Code

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

1 code implementation • CVPR 2021 • Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, Zeynep Akata

Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.

Audio Tagging audio-visual learning +5

Paper
Code

A Closer Look at Self-training for Zero-Label Semantic Segmentation

1 code implementation • 21 Apr 2021 • Giuseppe Pastore, Fabio Cermelli, Yongqin Xian, Massimiliano Mancini, Zeynep Akata, Barbara Caputo

Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation.

Segmentation Semantic Segmentation

Paper
Code

Uncertainty-aware Generalized Adaptive CycleGAN

1 code implementation • 23 Feb 2021 • Uddeshya Upadhyay, Yanbei Chen, Zeynep Akata

Unpaired image-to-image translation refers to learning inter-image-domain mapping in an unsupervised manner.

Image Denoising Image-to-Image Translation +1

Paper
Code

Learning Graph Embeddings for Compositional Zero-shot Learning

1 code implementation • CVPR 2021 • Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata

In compositional zero-shot learning, the goal is to recognize unseen compositions (e. g. old dog) of observed visual primitives states (e. g. old, cute) and objects (e. g. car, dog) in the training set.

Compositional Zero-Shot Learning Graph Embedding +1

107

Paper
Code

Open World Compositional Zero-Shot Learning

2 code implementations • CVPR 2021 • Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, Zeynep Akata

After estimating the feasibility score of each composition, we use these scores to either directly mask the output space or as a margin for the cosine similarity between visual features and compositional embeddings during training.

Compositional Zero-Shot Learning

107

Paper
Code

Adversarial Privacy Preservation in MRI Scans of the Brain

no code implementations • 1 Jan 2021 • Lennart Alexander Van der Goten, Tobias Hepp, Zeynep Akata, Kevin Smith

De-identification of magnetic resonance imagery (MRI) is intrinsically difficult since, even with all metadata removed, a person's face can easily be rendered and matched against a database.

De-identification

Paper
Add Code

Prototype-based Incremental Few-Shot Semantic Segmentation

1 code implementation • 30 Nov 2020 • Fabio Cermelli, Massimiliano Mancini, Yongqin Xian, Zeynep Akata, Barbara Caputo

Semantic segmentation models have two fundamental weaknesses: i) they require large training sets with costly pixel-level annotations, and ii) they have a static output space, constrained to the classes of the training set.

Few-Shot Semantic Segmentation Incremental Learning +3

Paper
Code

Attribute Prototype Network for Zero-Shot Learning

no code implementations • NeurIPS 2020 • Wenjia Xu, Yongqin Xian, Jiuniu Wang, Bernt Schiele, Zeynep Akata

As an additional benefit, our model points to the visual evidence of the attributes in an image, e. g. for the CUB dataset, confirming the improved attribute localization ability of our image representation.

Attribute Representation Learning +1

Paper
Add Code

Towards Recognizing Unseen Categories in Unseen Domains

1 code implementation • ECCV 2020 • Massimiliano Mancini, Zeynep Akata, Elisa Ricci, Barbara Caputo

The key idea of CuMix is to simulate the test-time domain and semantic shift using images and features from unseen domains and categories generated by mixing up the multiple source domains and categories available during training.

Ranked #3 on Zero-Shot Learning + Domain Generalization on DomainNet

Domain Generalization Zero-Shot Learning +1

Paper
Code

Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation

1 code implementation • 9 Jul 2020 • Yongqin Xian, Bruno Korbar, Matthijs Douze, Lorenzo Torresani, Bernt Schiele, Zeynep Akata

Few-shot learning aims to recognize novel classes from a few examples.

Few-Shot Image Classification Few-Shot Learning +7

Paper
Code

Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets

2 code implementations • 8 Jul 2020 • Junsuk Choe, Seong Joon Oh, Sanghyuk Chun, Seungho Lee, Zeynep Akata, Hyunjung Shim

In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.

Few-Shot Learning Model Selection +1

328

Paper
Code

Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based Image Retrieval

no code implementations • 20 Jun 2020 • Anjan Dutta, Zeynep Akata

Low-shot sketch-based image retrieval is an emerging task in computer vision, allowing to retrieve natural images relevant to hand-drawn sketch queries that are rarely seen during the training phase.

Generative Adversarial Network Retrieval +1

Paper
Add Code

Driver Intention Anticipation Based on In-Cabin and Driving SceneMonitoring

1 code implementation • 20 Jun 2020 • Yao Rong, Zeynep Akata, Enkelejda Kasneci

Numerous car accidents are caused by improper driving maneuvers.

Paper
Code

e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations

3 code implementations • 7 Apr 2020 • Virginie Do, Oana-Maria Camburu, Zeynep Akata, Thomas Lukasiewicz

The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning.

Multimodal Reasoning Natural Language Inference

153

Paper
Code

Learning Robust Representations via Multi-View Information Bottleneck

3 code implementations • ICLR 2020 • Marco Federici, Anjan Dutta, Patrick Forré, Nate Kushman, Zeynep Akata

This enables us to identify superfluous information as that not shared by both views.

Data Augmentation Representation Learning

117

Paper
Code

Evaluating Weakly Supervised Object Localization Methods Right

2 code implementations • CVPR 2020 • Junsuk Choe, Seong Joon Oh, Seungho Lee, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim

Few-Shot Learning Model Selection +2

328

Paper
Code

Understanding Misclassifications by Attributes

1 code implementation • 15 Oct 2019 • Sadaf Gulshad, Zeynep Akata, Jan Hendrik Metzen, Arnold Smeulders

We study the changes in attributes for clean as well as adversarial images in both standard and adversarially robust networks.

Paper
Code

Modeling Conceptual Understanding in Image Reference Games

1 code implementation • NeurIPS 2019 • Rodolfo Corona, Stephan Alaniz, Zeynep Akata

An agent who interacts with a wide population of other agents needs to be aware that there may be variations in their understanding of the world.

Attribute

Paper
Code

Relational Generalized Few-Shot Learning

no code implementations • 22 Jul 2019 • Xiahan Shi, Leonard Salewski, Martin Schiegg, Zeynep Akata, Max Welling

Instead, we consider the extended setup of generalized few-shot learning (GFSL), where the model is required to perform classification on the joint label space consisting of both previously seen and novel classes.

Few-Shot Learning Generalized Few-Shot Learning

Paper
Add Code

Bayesian Zero-Shot Learning

1 code implementation • 22 Jul 2019 • Sarkhan Badirli, Zeynep Akata, Murat Dundar

Object classes that surround us have a natural tendency to emerge at varying levels of abstraction.

Zero-Shot Learning

Paper
Code

Combining Generative and Discriminative Models for Hybrid Inference

1 code implementation • NeurIPS 2019 • Victor Garcia Satorras, Zeynep Akata, Max Welling

A graphical model is a structured representation of the data generating process.

Paper
Code

Interpreting Adversarial Examples with Attributes

1 code implementation • 17 Apr 2019 • Sadaf Gulshad, Jan Hendrik Metzen, Arnold Smeulders, Zeynep Akata

Deep computer vision systems being vulnerable to imperceptible and carefully crafted noise have raised questions regarding the robustness of their decisions.

Attribute General Classification

Paper
Code

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

no code implementations • CVPR 2019 • Yongqin Xian, Saurabh Sharma, Bernt Schiele, Zeynep Akata

When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes.

Ranked #3 on Generalized Zero-Shot Learning on SUN Attribute

Data Augmentation Few-Shot Learning +2

Paper
Add Code

Cross-Linked Variational Autoencoders for Generalized Zero-Shot Learning

no code implementations • ICLR Workshop LLD 2019 • Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

While following the same direction, we also take artificial feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by aligned variational autoencoders, for the purpose of generating latent features to train a softmax classifier.

Few-Shot Learning Generalized Zero-Shot Learning

Paper
Add Code

Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval

1 code implementation • CVPR 2019 • Anjan Dutta, Zeynep Akata

Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space.

feature selection Retrieval +1

111

Paper
Code

Learning Decision Trees Recurrently Through Communication

no code implementations • CVPR 2021 • Stephan Alaniz, Diego Marcos, Bernt Schiele, Zeynep Akata

Integrated interpretability without sacrificing the prediction accuracy of decision making algorithms has the potential of greatly improving their value to the user.

Decision Making Image Classification

Paper
Add Code

Visual Rationalizations in Deep Reinforcement Learning for Atari Games

no code implementations • 1 Feb 2019 • Laurens Weitkamp, Elise van der Pol, Zeynep Akata

Due to the capability of deep learning to perform well in high dimensional problems, deep reinforcement learning agents perform well in challenging tasks such as Atari 2600 games.

Atari Games Decision Making +2

Paper
Add Code

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

2 code implementations • 5 Dec 2018 • Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata

Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space.

Ranked #2 on Generalized Few-Shot Learning on AwA2

Few-Shot Learning Generalized Few-Shot Learning +1

281

Paper
Code

Manipulating Attributes of Natural Scenes via Hallucination

no code implementations • 22 Aug 2018 • Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem

In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene.

Hallucination Style Transfer +1

Paper
Add Code

Textual Explanations for Self-Driving Vehicles

2 code implementations • ECCV 2018 • Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata

Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments.

Paper
Code

Grounding Visual Explanations

no code implementations • ECCV 2018 • Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

Our model improves the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.

General Classification Sentence

Paper
Add Code

Generating Counterfactual Explanations with Natural Language

no code implementations • 26 Jun 2018 • Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

We call such textual explanations counterfactual explanations, and propose an intuitive method to generate counterfactual explanations by inspecting which evidence in an input is missing, but might contribute to a different classification decision if present in the image.

Classification counterfactual +2

Paper
Add Code

Primal-Dual Wasserstein GAN

no code implementations • 24 May 2018 • Mevlana Gemici, Zeynep Akata, Max Welling

We introduce Primal-Dual Wasserstein GAN, a new learning algorithm for building latent variable models of the data distribution based on the primal and the dual formulations of the optimal transport (OT) problem.

Paper
Add Code

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

1 code implementation • CVPR 2018 • Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths.

Activity Recognition Explainable Models +2

Paper
Code

Feature Generating Networks for Zero-Shot Learning

4 code implementations • CVPR 2018 • Yongqin Xian, Tobias Lorenz, Bernt Schiele, Zeynep Akata

Suffering from the extreme training data imbalance between seen and unseen classes, most of existing state-of-the-art approaches fail to achieve satisfactory results for the challenging generalized zero-shot learning task.

Ranked #5 on Generalized Zero-Shot Learning on SUN Attribute

Generalized Zero-Shot Learning Generative Adversarial Network

Paper
Code

Grounding Visual Explanations (Extended Abstract)

no code implementations • 17 Nov 2017 • Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata

Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image.

Attribute

Paper
Add Code

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

no code implementations • 17 Nov 2017 • Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

We also introduce a multimodal methodology for generating visual and textual explanations simultaneously.

Question Answering Visual Question Answering (VQA)

Paper
Add Code

Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly

9 code implementations • 3 Jul 2017 • Yongqin Xian, Christoph H. Lampert, Bernt Schiele, Zeynep Akata

Due to the importance of zero-shot learning, i. e. classifying images where there is a lack of labeled training data, the number of proposed approaches has recently increased steadily.

Zero-Shot Learning

Paper
Code

Zero-Shot Learning -- The Good, the Bad and the Ugly

1 code implementation • CVPR 2017 • Yongqin Xian, Bernt Schiele, Zeynep Akata

Due to the importance of zero-shot learning, the number of proposed approaches has increased steadily recently.

Zero-Shot Learning

Paper
Code

Exploiting saliency for object segmentation from image level labels

no code implementations • CVPR 2017 • Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, Bernt Schiele

We show how to combine both information sources in order to recover 80% of the fully supervised performance - which is the new state of the art in weakly supervised training for pixel-wise semantic labelling.

Ranked #26 on Semantic Segmentation on PASCAL VOC 2012 val

Object Semantic Segmentation

Paper
Add Code

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

no code implementations • 14 Dec 2016 • Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions.

Decision Making Question Answering +2

Paper
Add Code

Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts

no code implementations • 1 Dec 2016 • Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem

Automatic image synthesis research has been rapidly growing with deep networks getting more and more expressive.

Generative Adversarial Network Image Generation +1

Paper
Add Code

Gaze Embeddings for Zero-Shot Image Classification

no code implementations • CVPR 2017 • Nour Karessli, Zeynep Akata, Bernt Schiele, Andreas Bulling

Zero-shot image classification using auxiliary information, such as attributes describing discriminative object properties, requires time-consuming annotation by domain experts.

Classification Fine-Grained Image Classification +2

Paper
Add Code

Learning What and Where to Draw

no code implementations • NeurIPS 2016 • Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee

Generative Adversarial Networks (GANs) have recently demonstrated the capability to synthesize compelling real-world images, such as room interiors, album covers, manga, faces, birds, and flowers.

Ranked #13 on Text-to-Image Generation on CUB (using extra training data)

Text-to-Image Generation

Paper
Add Code

Learning Deep Representations of Fine-grained Visual Descriptions

9 code implementations • CVPR 2016 • Scott Reed, Zeynep Akata, Bernt Schiele, Honglak Lee

State-of-the-art methods for zero-shot visual recognition formulate learning as a joint embedding problem of images and side information.

Ranked #1 on Few-Shot Image Classification on CUB-200-2011 - 0-Shot

Attribute Image Retrieval +2

835

Paper
Code

Generative Adversarial Text to Image Synthesis

40 code implementations • 17 May 2016 • Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal.

Adversarial Text Text-to-Image Generation

1,850

Paper
Code

Latent Embeddings for Zero-shot Classification

no code implementations • CVPR 2016 • Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, Bernt Schiele

We train the model with a ranking based objective function which penalizes incorrect rankings of the true class for a given image.

Classification General Classification +1

Paper
Add Code

Multi-Cue Zero-Shot Learning with Strong Supervision

no code implementations • CVPR 2016 • Zeynep Akata, Mateusz Malinowski, Mario Fritz, Bernt Schiele

A promising research direction is zero-shot learning, which does not require any training data to recognize new classes, but rather relies on some form of auxiliary information describing the new classes.

Attribute Retrieval +1

Paper
Add Code

Generating Visual Explanations

no code implementations • 28 Mar 2016 • Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, Trevor Darrell

Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself.

General Classification Sentence +1

Paper
Add Code

Label-Embedding for Image Classification

2 code implementations • 30 Mar 2015 • Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid

Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce.

Ranked #7 on Multi-label zero-shot learning on Open Images V4

Attribute Classification +4

Paper
Code

Evaluation of Output Embeddings for Fine-Grained Image Classification

2 code implementations • CVPR 2015 • Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, Bernt Schiele

Image classification has advanced significantly in recent years with the availability of large-scale image sets.

Ranked #2 on Few-Shot Image Classification on CUB-200 - 0-Shot Learning

Classification Few-Shot Image Classification +4

Paper
Code

Label-Embedding for Attribute-Based Classification

no code implementations • CVPR 2013 • Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid

The label embedding framework offers other advantages such as the ability to leverage alternative sources of information in addition to attributes (e. g. class hierarchies) or to transition smoothly from zero-shot learning to learning with large quantities of data.

Ranked #5 on Few-Shot Image Classification on CUB-200-2011 - 0-Shot

Attribute Classification +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.