1 code implementation • CVPR 2025 • Nicolas Dufour, David Picard, Vicky Kalogeiton, Loic Landrieu
Global visual geolocation predicts where an image was captured on Earth.
Ranked #1 on
Photo geolocation estimation
on OpenStreetView-5M
1 code implementation • 19 Nov 2024 • David Picard, Nicolas Dufour
To alleviate this problem, we propose a drop-in replacement for MHA called the Polynomial Mixer (PoM) that has the benefit of encoding the entire sequence into an explicit state.
1 code implementation • 1 Jul 2024 • Robin Courant, Nicolas Dufour, Xi Wang, Marc Christie, Vicky Kalogeiton
dataset, we propose a diffusion-based approach, named DIRECTOR, which generates complex camera trajectories from textual captions that describe the relation and synchronisation between the camera and characters.
Ranked #1 on
3D Generation
on E.T. the Exceptional Trajectories
no code implementations • CVPR 2024 • Nicolas Dufour, Victor Besnier, Vicky Kalogeiton, David Picard
We then condition the diffusion model on both the conditional information and the coherence score.
1 code implementation • CVPR 2024 • Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis, Constantin Aronssohn, Nacim Bouia, Stephanie Fu, Romain Loiseau, Van Nguyen Nguyen, Charles Raude, Elliot Vincent, Lintao XU, HongYu Zhou, Loic Landrieu
Determining the location of an image anywhere on Earth is a complex visual task, which makes it particularly relevant for evaluating computer vision algorithms.
Ranked #2 on
Photo geolocation estimation
on OpenStreetView-5M
no code implementations • 19 Apr 2024 • Xi Wang, Nicolas Dufour, Nefeli Andreou, Marie-Paule Cani, Victoria Fernandez Abrevaya, David Picard, Vicky Kalogeiton
Classifier-Free Guidance (CFG) enhances the quality and condition adherence of text-to-image diffusion models.
no code implementations • 21 Mar 2023 • Robin Courant, Maika Edberg, Nicolas Dufour, Vicky Kalogeiton
For image classification, the most common Transformer Architecture uses only the Transformer Encoder in order to transform the various input tokens.
1 code implementation • 10 Oct 2022 • Nicolas Dufour, David Picard, Vicky Kalogeiton
In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details.
Ranked #1 on
Pose Transfer
on CelebAMask-HQ