Search Results for author: Hilde Kuehne

Found 50 papers, 32 papers with code

Preserving Modality Structure Improves Multi-Modal Learning

no code implementations ICCV 2023 Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah

Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations.

Retrieval Self-Supervised Learning

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

no code implementations21 May 2023 Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.

ISAAC Newton: Input-based Approximate Curvature for Newton's Method

1 code implementation1 May 2023 Felix Petersen, Tobias Sutter, Christian Borgelt, Dongsung Huh, Hilde Kuehne, Yuekai Sun, Oliver Deussen

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons.

Second-order methods

Learning Situation Hyper-Graphs for Video Question Answering

1 code implementation CVPR 2023 Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah

The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.

Question Answering Video Question Answering +1

WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition

1 code implementation11 Apr 2023 Marius Bock, Hilde Kuehne, Kristof Van Laerhoven, Michael Moeller

Though research has shown the complementarity of camera- and inertial-based data, datasets which offer both modalities remain scarce.

Egocentric Activity Recognition Human Activity Recognition +2

Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data

1 code implementation23 Mar 2023 Anna Kukleva, Moritz Böhle, Bernt Schiele, Hilde Kuehne, Christian Rupprecht

Such a schedule results in a constant `task switching' between an emphasis on instance discrimination and group-wise discrimination and thereby ensures that the model learns both group-wise features, as well as instance-specific details.

Self-Supervised Learning

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

1 code implementation ICCV 2023 Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Contrastive learning has become an important tool in learning representations from unlabeled data mainly relying on the idea of minimizing distance between positive data pairs, e. g., views from the same images, and maximizing distance between negative data pairs, e. g., views from different images.

Contrastive Learning Self-Supervised Learning

Video Test-Time Adaptation for Action Recognition

1 code implementation CVPR 2023 Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger, Hilde Kuehne, Horst Bischof

Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts.

Action Recognition Temporal Action Localization

Deep Differentiable Logic Gate Networks

1 code implementation15 Oct 2022 Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

Recently, research has increasingly focused on developing efficient neural network architectures.

Efficient Neural Network

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

1 code implementation7 Oct 2022 Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English.

Knowledge Distillation Retrieval +2

Contrastive Audio-Visual Masked Autoencoder

1 code implementation2 Oct 2022 Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

 Ranked #1 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +4

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

1 code implementation12 Sep 2022 Felix Vogel, Nina Shvetsova, Leonid Karlinsky, Hilde Kuehne

We follow up with the analysis of the attribute-based zero-shot learning capabilities of these models, evaluating how well this classical zero-shot notion emerges from large-scale webly supervision.

Retrieval Text Retrieval +1

Augmentation Learning for Semi-Supervised Classification

no code implementations3 Aug 2022 Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne

As the accuracy for ImageNet and similar datasets increased over time, the performance on tasks beyond the classification of natural images is yet to be explored.

Classification Data Augmentation +1

Weakly Supervised Grounding for VQA in Vision-Language Transformers

1 code implementation5 Jul 2022 Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah

Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.

Question Answering Representation Learning +1

Differentiable Top-k Classification Learning

1 code implementation15 Jun 2022 Felix Petersen, Hilde Kuehne, Christian Borgelt, Oliver Deussen

In this work, we relax this assumption and optimize the model for multiple k simultaneously instead of using a single k. Leveraging recent advances in differentiable sorting and ranking, we propose a differentiable top-k cross-entropy classification loss.

General Classification Image Classification

CycDA: Unsupervised Cycle Domain Adaptation from Image to Video

1 code implementation30 Mar 2022 Wei Lin, Anna Kukleva, Kunyang Sun, Horst Possegger, Hilde Kuehne, Horst Bischof

To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation by leveraging the joint spatial information in images and videos on the one hand and, on the other hand, training an independent spatio-temporal model to bridge the modality gap.

Action Recognition Domain Adaptation +1

Monotonic Differentiable Sorting Networks

1 code implementation ICLR 2022 Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

We introduce a family of sigmoid functions and prove that they produce differentiable sorting networks that are monotonic.

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

1 code implementation CVPR 2022 Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio S. Feris, David Harwath, James Glass, Hilde Kuehne

In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.

Action Localization Retrieval +2

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

1 code implementation8 Dec 2021 Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.

Action Localization Retrieval +2

Unsupervised Domain Generalization by Learning a Bridge Across Domains

1 code implementation CVPR 2022 Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.

Domain Generalization Self-Supervised Learning

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations1 Dec 2021 Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

Style Agnostic 3D Reconstruction via Adversarial Style Transfer

no code implementations20 Oct 2021 Felix Petersen, Bastian Goldluecke, Oliver Deussen, Hilde Kuehne

Recently introduced differentiable renderers can be leveraged to learn the 3D geometry of objects from 2D images, but those approaches require additional supervision to enable the renderer to produce an output that can be compared to the input image.

3D Object Reconstruction 3D Reconstruction +2

Learning with Algorithmic Supervision via Continuous Relaxations

1 code implementation NeurIPS 2021 Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

The integration of algorithmic components into neural architectures has gained increased attention recently, as it allows training neural networks with new forms of supervision such as ordering constraints or silhouettes instead of using ground truth labels.

A Sampling-Free Approximation of Gaussian Variational Auto-Encoders

no code implementations29 Sep 2021 Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

We propose a sampling-free approximate formulation of Gaussian variational auto-encoders.

Propagating Distributions through Neural Networks

no code implementations29 Sep 2021 Felix Petersen, Christian Borgelt, Mikhail Yurochkin, Hilde Kuehne, Oliver Deussen

We propose a new approach to propagating probability distributions through neural networks.


Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting

1 code implementation ICCV 2021 Anna Kukleva, Hilde Kuehne, Bernt Schiele

Both generalized and incremental few-shot learning have to deal with three major challenges: learning novel classes from only few samples per class, preventing catastrophic forgetting of base classes, and classifier calibration across novel and base classes.

Classifier calibration Few-Shot Learning

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

1 code implementation CVPR 2021 Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah

In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone.

Question Answering Visual Question Answering

Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision

1 code implementation9 May 2021 Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

Sorting and ranking supervision is a method for training neural networks end-to-end based on ordering constraints.

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

1 code implementation ICCV 2021 Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.

Clustering Contrastive Learning +5

Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data

1 code implementation3 Jun 2019 Hilde Kuehne, Ahsan Iqbal, Alexander Richard, Juergen Gall

Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field.

Action Recognition General Classification +1

Unsupervised learning of action classes with continuous temporal embedding

2 code implementations CVPR 2019 Anna Kukleva, Hilde Kuehne, Fadime Sener, Juergen Gall

The task of temporally detecting and segmenting actions in untrimmed videos has seen an increased attention recently.

Recurrent Residual Learning for Action Recognition

no code implementations27 Jun 2017 Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall

In this work, we propose a novel recurrent ConvNet architecture called recurrent residual networks to address the task of action recognition.

Action Recognition Image Classification +1

Weakly supervised learning of actions from transcripts

no code implementations7 Oct 2016 Hilde Kuehne, Alexander Richard, Juergen Gall

Our system is based on the idea that, given a sequence of input data and a transcript, i. e. a list of the order the actions occur in the video, it is possible to infer the actions within the video stream, and thus, learn the related action models without the need for any frame-based annotation.

Weakly-supervised Learning

Cooking in the kitchen: Recognizing and Segmenting Human Activities in Videos

no code implementations25 Aug 2015 Hilde Kuehne, Juergen Gall, Thomas Serre

Through extensive system evaluations, we demonstrate that combining compact video representations based on Fisher Vectors with HMM-based modeling yields very significant gains in accuracy and when properly trained with sufficient training samples, structured temporal models outperform unstructured bag-of-word types of models by a large margin on the tested performance metric.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.