Search Results for author: A. Sophia Koepke

Found 12 papers, 8 papers with code

Temporal and cross-modal attention for audio-visual zero-shot learning

1 code implementation20 Jul 2022 Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

We show that our proposed framework that ingests temporal features yields state-of-the-art performance on the \ucf, \vgg, and \activity benchmarks for (generalised) zero-shot learning.

Video Classification Zero-Shot Learning

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

1 code implementation5 Apr 2022 Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata

We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset.

Explanation Generation Question Answering +4

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

1 code implementation CVPR 2022 Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata

Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes.

GZSL Video Classification Zero-Shot Learning +1

Audio Retrieval with Natural Language Queries: A Benchmark Study

1 code implementation17 Dec 2021 A. Sophia Koepke, Andreea-Maria Oncescu, João F. Henriques, Zeynep Akata, Samuel Albanie

Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.

Audio captioning Audio to Text Retrieval +4

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

1 code implementation CVPR 2021 Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, Zeynep Akata

Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.

Audio Tagging audio-visual learning +5

Self-supervised learning of class embeddings from video

no code implementations28 Oct 2019 Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

This work explores how to use self-supervised learning on videos to learn a class-specific image embedding that encodes pose and shape information.

Self-Supervised Learning

X2Face: A network for controlling face generation using images, audio, and pose codes

no code implementations ECCV 2018 Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e. g. audio).

Talking Head Generation

Self-supervised learning of a facial attribute embedding from video

2 code implementations21 Aug 2018 Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time.

Self-Supervised Learning Unsupervised Facial Landmark Detection

X2Face: A network for controlling face generation by using images, audio, and pose codes

no code implementations27 Jul 2018 Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e. g. audio).

Face Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.