Search Results for author: Nina Shvetsova

Found 13 papers, 10 papers with code

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

1 code implementation • 7 Oct 2023 • Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne

Specifically, we prompt an LLM to create plausible video descriptions based on ASR narrations of the video for a large-scale instructional video dataset.

Automatic Speech Recognition Sentence +3

Paper
Code

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

1 code implementation • ICCV 2023 • Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Large-scale noisy web image-text datasets have been proven to be efficient for learning robust vision-language models.

Retrieval Style Transfer +1

Paper
Code

Preserving Modality Structure Improves Multi-Modal Learning

1 code implementation • ICCV 2023 • Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah

Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations.

Retrieval Self-Supervised Learning

Paper
Code

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

no code implementations • 29 Mar 2023 • Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

Spatio-temporal grounding describes the task of localizing events in space and time, e. g., in video data, based on verbal descriptions only.

Representation Learning Spatio-Temporal Video Grounding

Paper
Add Code

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

1 code implementation • ICCV 2023 • Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogerio Feris, Hilde Kuehne, Horst Bischof

We adapt a VL model for zero-shot and few-shot action recognition using a collection of unlabeled videos and an unpaired action dictionary.

Ranked #3 on Zero-Shot Action Recognition on Kinetics

Few-Shot action recognition Few Shot Action Recognition +5

Paper
Code

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

1 code implementation • ICCV 2023 • Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Contrastive learning has become an important tool in learning representations from unlabeled data mainly relying on the idea of minimizing distance between positive data pairs, e. g., views from the same images, and maximizing distance between negative data pairs, e. g., views from different images.

Contrastive Learning Self-Supervised Learning

Paper
Code

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

1 code implementation • 7 Oct 2022 • Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English.

Knowledge Distillation Retrieval +2

Paper
Code

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

1 code implementation • 12 Sep 2022 • Felix Vogel, Nina Shvetsova, Leonid Karlinsky, Hilde Kuehne

We follow up with the analysis of the attribute-based zero-shot learning capabilities of these models, evaluating how well this classical zero-shot notion emerges from large-scale webly supervision.

Attribute Retrieval +2

Paper
Code

Augmentation Learning for Semi-Supervised Classification

no code implementations • 3 Aug 2022 • Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne

As the accuracy for ImageNet and similar datasets increased over time, the performance on tasks beyond the classification of natural images is yet to be explored.

Classification Data Augmentation +1

Paper
Add Code

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

1 code implementation • CVPR 2022 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio S. Feris, David Harwath, James Glass, Hilde Kuehne

In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.

Action Localization Retrieval +2

Paper
Code

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

1 code implementation • 8 Dec 2021 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.

Action Localization Retrieval +2

Paper
Code

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations • 1 Dec 2021 • Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

Paper
Add Code

Anomaly Detection in Medical Imaging with Deep Perceptual Autoencoders

2 code implementations • 23 Jun 2020 • Nina Shvetsova, Bart Bakker, Irina Fedulova, Heinrich Schulz, Dmitry V. Dylov

To address this problem, we introduce a new powerful method of image anomaly detection.

Unsupervised Anomaly Detection

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.