1 code implementation • 25 Oct 2022 • Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, Andreas Geiger
Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene.
1 code implementation • 20 Jul 2022 • Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata
We show that our proposed framework that ingests temporal features yields state-of-the-art performance on the \ucf, \vgg, and \activity benchmarks for (generalised) zero-shot learning.
1 code implementation • 5 Apr 2022 • Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata
We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset.
Ranked #1 on
Explanation Generation
on CLEVR-X
1 code implementation • CVPR 2022 • Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata
Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes.
Ranked #1 on
GZSL Video Classification
on VGGSound-GZSL (cls)
1 code implementation • 17 Dec 2021 • A. Sophia Koepke, Andreea-Maria Oncescu, João F. Henriques, Zeynep Akata, Samuel Albanie
Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.
Ranked #1 on
Text to Audio Retrieval
on AudioCaps
1 code implementation • 5 May 2021 • Andreea-Maria Oncescu, A. Sophia Koepke, João F. Henriques, Zeynep Akata, Samuel Albanie
We consider the task of retrieving audio using free-form natural language queries.
Ranked #1 on
Audio to Text Retrieval
on Clotho
no code implementations • 4 May 2021 • Yanbei Chen, Thomas Hummel, A. Sophia Koepke, Zeynep Akata
Recent advances in XAI provide explanations for models trained on still images.
Explainable artificial intelligence
Multimodal Deep Learning
1 code implementation • CVPR 2021 • Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, Zeynep Akata
Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.
no code implementations • 28 Oct 2019 • Olivia Wiles, A. Sophia Koepke, Andrew Zisserman
This work explores how to use self-supervised learning on videos to learn a class-specific image embedding that encodes pose and shape information.
no code implementations • ECCV 2018 • Olivia Wiles, A. Sophia Koepke, Andrew Zisserman
The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e. g. audio).
2 code implementations • 21 Aug 2018 • Olivia Wiles, A. Sophia Koepke, Andrew Zisserman
We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time.
Ranked #2 on
Unsupervised Facial Landmark Detection
on 300W
Self-Supervised Learning
Unsupervised Facial Landmark Detection
no code implementations • 27 Jul 2018 • Olivia Wiles, A. Sophia Koepke, Andrew Zisserman
The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e. g. audio).