Search Results for author: Simon Jenni

Found 25 papers, 3 papers with code

Learning Video Representations by Transforming Time

no code implementations ECCV 2020 Simon Jenni, Givi Meishvili, Paolo Favaro

Our representations can be learned from data without human annotation and provide a substantial boost to the training of neural networks on small labeled data sets for tasks such as action recognition, which require to accurately distinguish the motion of objects.

Action Recognition Self-Supervised Learning

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets

no code implementations2 Sep 2024 Ishan Rajendrakumar Dave, Fabian Caba Heilbron, Mubarak Shah, Simon Jenni

Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos.

Video Alignment Video Editing +1

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

no code implementations23 Apr 2024 Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.

Hallucination In-Context Learning +2

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

no code implementations CVPR 2024 Gihyun Kwon, Simon Jenni, DIngzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron

While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging.

Text-to-Image Generation

Building Vision-Language Models on Solid Foundations with Masked Distillation

no code implementations CVPR 2024 Sepehr Sameni, Kushal Kafle, Hao Tan, Simon Jenni

Recent advancements in Vision-Language Models (VLMs) have marked a significant leap in bridging the gap between computer vision and natural language processing.

Contrastive Learning Knowledge Distillation +4

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

no code implementations20 Dec 2023 Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah

To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.

Action Classification Attribute +7

DECORAIT -- DECentralized Opt-in/out Registry for AI Training

no code implementations25 Sep 2023 Kar Balan, Alex Black, Simon Jenni, Andrew Gilbert, Andy Parsons, John Collomosse

We report a prototype of DECORAIT, which explores hierarchical clustering and a combination of on/off-chain storage to create a scalable decentralized registry to trace the provenance of GenAI training data in order to determine training consent and reward creatives who contribute that data.

EKILA: Synthetic Media Provenance and Attribution for Generative Art

no code implementations10 Apr 2023 Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, Andrew Gilbert, John Collomosse

We present EKILA; a decentralized framework that enables creatives to receive recognition and reward for their contributions to generative AI (GenAI).

VADER: Video Alignment Differencing and Retrieval

no code implementations ICCV 2023 Alexander Black, Simon Jenni, Tu Bui, Md. Mehrab Tanjim, Stefano Petrangeli, Ritwik Sinha, Viswanathan Swaminathan, John Collomosse

We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos.

Misinformation Retrieval +2

Audio-Visual Contrastive Learning with Temporal Self-Supervision

no code implementations15 Feb 2023 Simon Jenni, Alexander Black, John Collomosse

We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision.

Action Recognition Audio Classification +3

Spatio-Temporal Crop Aggregation for Video Representation Learning

no code implementations ICCV 2023 Sepehr Sameni, Simon Jenni, Paolo Favaro

We propose Spatio-temporal Crop Aggregation for video representation LEarning (SCALE), a novel method that enjoys high scalability at both training and inference time.

Action Classification Dimensionality Reduction +3

SImProv: Scalable Image Provenance Framework for Robust Content Attribution

no code implementations28 Jun 2022 Alexander Black, Tu Bui, Simon Jenni, Zhifei Zhang, Viswanathan Swaminanthan, John Collomosse

We present SImProv - a scalable image provenance framework to match a query image back to a trusted database of originals and identify possible manipulations on the query.

Re-Ranking Retrieval

Video-ReTime: Learning Temporally Varying Speediness for Time Remapping

no code implementations11 May 2022 Simon Jenni, Markus Woodson, Fabian Caba Heilbron

Furthermore, we propose an optimization for video re-timing that enables precise control over the target duration and performs more robustly on longer videos than prior methods.

Action Recognition

Representation Learning by Detecting Incorrect Location Embeddings

1 code implementation10 Apr 2022 Sepehr Sameni, Simon Jenni, Paolo Favaro

We represent object parts with image tokens and train a ViT to detect which token has been combined with an incorrect positional embedding.

Ranked #91 on Image Classification on ObjectNet (using extra training data)

Image Classification Object +2

Learning to Deblur and Rotate Motion-Blurred Faces

no code implementations14 Dec 2021 Givi Meishvili, Attila Szabó, Simon Jenni, Paolo Favaro

Our method handles the complexity of face blur by implicitly learning the geometry and motion of faces through the joint training on three large datasets: FFHQ and 300VW, which are publicly available, and a new Bern Multi-View Face Dataset (BMFD) that we built.

Decoder

Time-Equivariant Contrastive Video Representation Learning

no code implementations ICCV 2021 Simon Jenni, Hailin Jin

We introduce a novel self-supervised contrastive learning method to learn representations from unlabelled videos.

Action Recognition Contrastive Learning +3

VPN: Video Provenance Network for Robust Content Attribution

no code implementations21 Sep 2021 Alexander Black, Tu Bui, Simon Jenni, Vishy Swaminathan, John Collomosse

We present VPN - a content attribution method for recovering provenance information from videos shared online.

Contrastive Learning

Self-Supervised Multi-View Synchronization Learning for 3D Pose Estimation

no code implementations13 Oct 2020 Simon Jenni, Paolo Favaro

Current state-of-the-art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses.

3D Pose Estimation Monocular 3D Human Pose Estimation +1

Video Representation Learning by Recognizing Temporal Transformations

no code implementations21 Jul 2020 Simon Jenni, Givi Meishvili, Paolo Favaro

Our representations can be learned from data without human annotation and provide a substantial boost to the training of neural networks on small labeled data sets for tasks such as action recognition, which require to accurately distinguish the motion of objects.

Action Recognition Representation Learning +1

Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics

no code implementations CVPR 2020 Simon Jenni, Hailin Jin, Paolo Favaro

Based on this criterion, we introduce a novel image transformation that we call limited context inpainting (LCI).

Learning to Have an Ear for Face Super-Resolution

no code implementations CVPR 2020 Givi Meishvili, Simon Jenni, Paolo Favaro

To combine the aural and visual modalities, we propose a method to first build the latent representations of a face from the lone audio track and then from the lone low-resolution image.

Audio Super-Resolution Face Reconstruction +2

On Stabilizing Generative Adversarial Training with Noise

no code implementations CVPR 2019 Simon Jenni, Paolo Favaro

We notice that the distributions of real and generated data should match even when they undergo the same filtering.

Deep Bilevel Learning

1 code implementation ECCV 2018 Simon Jenni, Paolo Favaro

Our approach is based on the principles of cross-validation, where a validation set is used to limit the model overfitting.

Bilevel Optimization

Self-Supervised Feature Learning by Learning to Spot Artifacts

no code implementations CVPR 2018 Simon Jenni, Paolo Favaro

To generate images with artifacts, we pre-train a high-capacity autoencoder and then we use a damage and repair strategy: First, we freeze the autoencoder and damage the output of the encoder by randomly dropping its entries.

Decoder Self-Supervised Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.