1 code implementation • 28 Mar 2024 • Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma
Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition.
1 code implementation • 27 Mar 2024 • Noor Ahmed, Anna Kukleva, Bernt Schiele
To address these challenges, we propose the OrCo framework built on two core principles: features' orthogonality in the representation space, and contrastive learning.
1 code implementation • ICCV 2023 • Yue Fan, Anna Kukleva, Dengxin Dai, Bernt Schiele
In experiments, SSB greatly improves both inlier classification and outlier detection performance, outperforming existing methods by a large margin.
1 code implementation • 7 Oct 2023 • Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne
Specifically, we prompt an LLM to create plausible video descriptions based on ASR narrations of the video for a large-scale instructional video dataset.
1 code implementation • ICCV 2023 • Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne
Large-scale noisy web image-text datasets have been proven to be efficient for learning robust vision-language models.
1 code implementation • 23 Mar 2023 • Anna Kukleva, Moritz Böhle, Bernt Schiele, Hilde Kuehne, Christian Rupprecht
Such a schedule results in a constant `task switching' between an emphasis on instance discrimination and group-wise discrimination and thereby ensures that the model learns both group-wise features, as well as instance-specific details.
no code implementations • 9 Mar 2023 • Wei Lin, Anna Kukleva, Horst Possegger, Hilde Kuehne, Horst Bischof
Temporal action segmentation in untrimmed videos has gained increased attention recently.
1 code implementation • ICCV 2023 • Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne
Contrastive learning has become an important tool in learning representations from unlabeled data mainly relying on the idea of minimizing distance between positive data pairs, e. g., views from the same images, and maximizing distance between negative data pairs, e. g., views from different images.
no code implementations • 23 Sep 2022 • Enea Duka, Anna Kukleva, Bernt Schiele
To enhance representations via self-supervised training for the task of unintentional action recognition we propose temporal transformations, called Temporal Transformations of Inherent Biases of Unintentional Actions (T2IBUA).
1 code implementation • 30 Mar 2022 • Wei Lin, Anna Kukleva, Kunyang Sun, Horst Possegger, Hilde Kuehne, Horst Bischof
To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation by leveraging the joint spatial information in images and videos on the one hand and, on the other hand, training an independent spatio-temporal model to bridge the modality gap.
no code implementations • 10 Dec 2021 • Yue Fan, Anna Kukleva, Bernt Schiele
Generally, the aim is to train a model that is invariant to various data augmentations.
1 code implementation • CVPR 2022 • Yue Fan, Dengxin Dai, Anna Kukleva, Bernt Schiele
In this paper, we propose a novel co-learning framework (CoSSL) with decoupled representation learning and classifier learning for imbalanced SSL.
1 code implementation • ICCV 2021 • Anna Kukleva, Hilde Kuehne, Bernt Schiele
Both generalized and incremental few-shot learning have to deal with three major challenges: learning novel classes from only few samples per class, preventing catastrophic forgetting of base classes, and classifier calibration across novel and base classes.
1 code implementation • CVPR 2020 • Anna Kukleva, Makarand Tapaswi, Ivan Laptev
Localizing the pair of interacting characters in video is a time-consuming process, instead, we train our model to learn from clip-level weak labels.
no code implementations • 29 Jan 2020 • Rosaura G. VidalMata, Walter J. Scheirer, Anna Kukleva, David Cox, Hilde Kuehne
Understanding the structure of complex activities in untrimmed videos is a challenging task in the area of action recognition.
1 code implementation • 5 Sep 2019 • Anna Kukleva, Mohammad Asif Khan, Hafez Farazi, Sven Behnke
We first solve the detection task for an image using fully convolutional encoder-decoder architecture, and later, we use it as an input to our temporal models and jointly learn the detection task in sequences of images.
2 code implementations • CVPR 2019 • Anna Kukleva, Hilde Kuehne, Fadime Sener, Juergen Gall
The task of temporally detecting and segmenting actions in untrimmed videos has seen an increased attention recently.