no code implementations • ECCV 2020 • Simon Jenni, Givi Meishvili, Paolo Favaro
Our representations can be learned from data without human annotation and provide a substantial boost to the training of neural networks on small labeled data sets for tasks such as action recognition, which require to accurately distinguish the motion of objects.
no code implementations • 2 Sep 2024 • Ishan Rajendrakumar Dave, Fabian Caba Heilbron, Mubarak Shah, Simon Jenni
Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos.
no code implementations • 23 Apr 2024 • Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.
no code implementations • CVPR 2024 • Gihyun Kwon, Simon Jenni, DIngzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging.
no code implementations • CVPR 2024 • Sepehr Sameni, Kushal Kafle, Hao Tan, Simon Jenni
Recent advancements in Vision-Language Models (VLMs) have marked a significant leap in bridging the gap between computer vision and natural language processing.
no code implementations • 20 Dec 2023 • Ishan Rajendrakumar Dave, Simon Jenni, Mubarak Shah
To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.
no code implementations • 25 Sep 2023 • Kar Balan, Alex Black, Simon Jenni, Andrew Gilbert, Andy Parsons, John Collomosse
We report a prototype of DECORAIT, which explores hierarchical clustering and a combination of on/off-chain storage to create a scalable decentralized registry to trace the provenance of GenAI training data in order to determine training consent and reward creatives who contribute that data.
1 code implementation • CVPR 2023 • Chun-Hsiao Yeh, Bryan Russell, Josef Sivic, Fabian Caba Heilbron, Simon Jenni
Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications.
no code implementations • 10 Apr 2023 • Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, Andrew Gilbert, John Collomosse
We present EKILA; a decentralized framework that enables creatives to receive recognition and reward for their contributions to generative AI (GenAI).
no code implementations • ICCV 2023 • Alexander Black, Simon Jenni, Tu Bui, Md. Mehrab Tanjim, Stefano Petrangeli, Ritwik Sinha, Viswanathan Swaminathan, John Collomosse
We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos.
no code implementations • 15 Feb 2023 • Simon Jenni, Alexander Black, John Collomosse
We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision.
no code implementations • ICCV 2023 • Sepehr Sameni, Simon Jenni, Paolo Favaro
We propose Spatio-temporal Crop Aggregation for video representation LEarning (SCALE), a novel method that enjoys high scalability at both training and inference time.
no code implementations • 28 Jun 2022 • Alexander Black, Tu Bui, Simon Jenni, Zhifei Zhang, Viswanathan Swaminanthan, John Collomosse
We present SImProv - a scalable image provenance framework to match a query image back to a trusted database of originals and identify possible manipulations on the query.
no code implementations • 11 May 2022 • Simon Jenni, Markus Woodson, Fabian Caba Heilbron
Furthermore, we propose an optimization for video re-timing that enables precise control over the target duration and performs more robustly on longer videos than prior methods.
1 code implementation • 10 Apr 2022 • Sepehr Sameni, Simon Jenni, Paolo Favaro
We represent object parts with image tokens and train a ViT to detect which token has been combined with an incorrect positional embedding.
Ranked #91 on Image Classification on ObjectNet (using extra training data)
no code implementations • 14 Dec 2021 • Givi Meishvili, Attila Szabó, Simon Jenni, Paolo Favaro
Our method handles the complexity of face blur by implicitly learning the geometry and motion of faces through the joint training on three large datasets: FFHQ and 300VW, which are publicly available, and a new Bern Multi-View Face Dataset (BMFD) that we built.
no code implementations • ICCV 2021 • Simon Jenni, Hailin Jin
We introduce a novel self-supervised contrastive learning method to learn representations from unlabelled videos.
no code implementations • 21 Sep 2021 • Alexander Black, Tu Bui, Simon Jenni, Vishy Swaminathan, John Collomosse
We present VPN - a content attribution method for recovering provenance information from videos shared online.
no code implementations • 13 Oct 2020 • Simon Jenni, Paolo Favaro
Current state-of-the-art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses.
no code implementations • 21 Jul 2020 • Simon Jenni, Givi Meishvili, Paolo Favaro
Our representations can be learned from data without human annotation and provide a substantial boost to the training of neural networks on small labeled data sets for tasks such as action recognition, which require to accurately distinguish the motion of objects.
no code implementations • CVPR 2020 • Simon Jenni, Hailin Jin, Paolo Favaro
Based on this criterion, we introduce a novel image transformation that we call limited context inpainting (LCI).
no code implementations • CVPR 2020 • Givi Meishvili, Simon Jenni, Paolo Favaro
To combine the aural and visual modalities, we propose a method to first build the latent representations of a face from the lone audio track and then from the lone low-resolution image.
no code implementations • CVPR 2019 • Simon Jenni, Paolo Favaro
We notice that the distributions of real and generated data should match even when they undergo the same filtering.
1 code implementation • ECCV 2018 • Simon Jenni, Paolo Favaro
Our approach is based on the principles of cross-validation, where a validation set is used to limit the model overfitting.
no code implementations • CVPR 2018 • Simon Jenni, Paolo Favaro
To generate images with artifacts, we pre-train a high-capacity autoencoder and then we use a damage and repair strategy: First, we freeze the autoencoder and damage the output of the encoder by randomly dropping its entries.