Search Results for author: A. Sophia Koepke

Found 22 papers, 16 papers with code

Self-supervised learning of a facial attribute embedding from video

2 code implementations21 Aug 2018 Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time.

Attribute Self-Supervised Learning +1

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

1 code implementation CVPR 2021 Yanbei Chen, Yongqin Xian, A. Sophia Koepke, Ying Shan, Zeynep Akata

Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.

Audio Tagging audio-visual learning +5

Waffling around for Performance: Visual Classification with Random Words and Broad Concepts

1 code implementation ICCV 2023 Karsten Roth, Jae Myung Kim, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata

The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3.

Classification Language Modelling +1

Audio Retrieval with Natural Language Queries: A Benchmark Study

1 code implementation17 Dec 2021 A. Sophia Koepke, Andreea-Maria Oncescu, João F. Henriques, Zeynep Akata, Samuel Albanie

Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.

AudioCaps Audio captioning +5

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

1 code implementation CVPR 2022 Otniel-Bogdan Mercea, Lukas Riesch, A. Sophia Koepke, Zeynep Akata

Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes.

GZSL Video Classification ZSL Video Classification

Temporal and cross-modal attention for audio-visual zero-shot learning

2 code implementations20 Jul 2022 Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

We show that our proposed framework that ingests temporal features yields state-of-the-art performance on the \ucf, \vgg, and \activity benchmarks for (generalised) zero-shot learning.

GZSL Video Classification Video Classification

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

1 code implementation5 Apr 2022 Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata

We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset.

Explanation Generation Question Answering +3

Image-free Classifier Injection for Zero-Shot Classification

1 code implementation ICCV 2023 Anders Christensen, Massimiliano Mancini, A. Sophia Koepke, Ole Winther, Zeynep Akata

We achieve this with our proposed Image-free Classifier Injection with Semantics (ICIS) that injects classifiers for new, unseen classes into pre-trained classification models in a post-hoc fashion without relying on image data.

Classification Image Classification +1

Zero-shot audio captioning with audio-language model guidance and audio context keywords

1 code implementation14 Nov 2023 Leonard Salewski, Stefan Fauth, A. Sophia Koepke, Zeynep Akata

In particular, our framework exploits a pre-trained large language model (LLM) for generating the text which is guided by a pre-trained audio-language model to produce captions that describe the audio content.

Descriptive Image Captioning +5

Text-to-feature diffusion for audio-visual few-shot learning

1 code implementation7 Sep 2023 Otniel-Bogdan Mercea, Thomas Hummel, A. Sophia Koepke, Zeynep Akata

Training deep learning models for video classification from audio-visual data commonly requires immense amounts of labeled training data collected via a costly process.

Classification Few-Shot Learning +1

Video-adverb retrieval with compositional adverb-action embeddings

1 code implementation26 Sep 2023 Thomas Hummel, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata

We propose a framework for video-to-adverb retrieval (and vice versa) that aligns video embeddings with their matching compositional adverb-action text embedding in a joint embedding space.

Video-Adverb Retrieval (Unseen Compositions)

Addressing caveats of neural persistence with deep graph persistence

1 code implementation20 Jul 2023 Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke

Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights.

Topological Data Analysis

X2Face: A network for controlling face generation by using images, audio, and pose codes

no code implementations27 Jul 2018 Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e. g. audio).

Face Generation

X2Face: A network for controlling face generation using images, audio, and pose codes

no code implementations ECCV 2018 Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e. g. audio).

Talking Head Generation

Self-supervised learning of class embeddings from video

no code implementations28 Oct 2019 Olivia Wiles, A. Sophia Koepke, Andrew Zisserman

This work explores how to use self-supervised learning on videos to learn a class-specific image embedding that encodes pose and shape information.

Self-Supervised Learning

Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval

no code implementations6 Apr 2023 Jae Myung Kim, A. Sophia Koepke, Cordelia Schmid, Zeynep Akata

In this work, we introduce ODmAP@k, an object decorrelation metric that measures a model's robustness to spurious correlations in the training data.

Cross-Modal Retrieval Object +2

A SOUND APPROACH: Using Large Language Models to generate audio descriptions for egocentric text-audio retrieval

no code implementations29 Feb 2024 Andreea-Maria Oncescu, João F. Henriques, Andrew Zisserman, Samuel Albanie, A. Sophia Koepke

Furthermore, we show that using the same prompts, we can successfully employ LLMs to improve the retrieval on EpicSounds, compared to using the original audio class labels of the dataset.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.