Search Results for author: Juan-Manuel Perez-Rua

Found 19 papers, 8 papers with code

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

no code implementations24 Dec 2023 Christian Simon, Sen He, Juan-Manuel Perez-Rua, Mengmeng Xu, Amine Benhalloum, Tao Xiang

Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability.

Image to 3D Neural Rendering

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

no code implementations9 Oct 2023 Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He

In this paper, for the first time, we introduce optical flow into the attention module in the diffusion model's U-Net to address the inconsistency issue for text-to-video editing.

Optical Flow Estimation Text-to-Video Editing +1

Multi-Modal Few-Shot Temporal Action Detection

1 code implementation27 Nov 2022 Sauradip Nag, Mengmeng Xu, Xiatian Zhu, Juan-Manuel Perez-Rua, Bernard Ghanem, Yi-Zhe Song, Tao Xiang

In this work, we introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD by leveraging few-shot support videos and new class names jointly.

Action Detection Few-Shot Object Detection +3

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization

1 code implementation CVPR 2023 Mengmeng Xu, Yanghao Li, Cheng-Yang Fu, Bernard Ghanem, Tao Xiang, Juan-Manuel Perez-Rua

Our experiments show the proposed adaptations improve egocentric query detection, leading to a better visual query localization system in both 2D and 3D configurations.


Negative Frames Matter in Egocentric Visual Query 2D Localization

1 code implementation3 Aug 2022 Mengmeng Xu, Cheng-Yang Fu, Yanghao Li, Bernard Ghanem, Juan-Manuel Perez-Rua, Tao Xiang

The repeated gradient computation of the same object lead to an inefficient training; (2) The false positive rate is high on background frames.


Space-time Mixing Attention for Video Transformer

1 code implementation NeurIPS 2021 Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos

In this work, we propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence and hence induces no overhead compared to an image-based Transformer model.

Action Classification Action Recognition In Videos +1

Few-shot Action Recognition with Prototype-centered Attentive Learning

1 code implementation20 Jan 2021 Xiatian Zhu, Antoine Toisoul, Juan-Manuel Perez-Rua, Li Zhang, Brais Martinez, Tao Xiang

Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.

Contrastive Learning Few-Shot action recognition +3

Egocentric Action Recognition by Video Attention and Temporal Context

no code implementations3 Jul 2020 Juan-Manuel Perez-Rua, Antoine Toisoul, Brais Martinez, Victor Escorcia, Li Zhang, Xiatian Zhu, Tao Xiang

In this challenge, action recognition is posed as the problem of simultaneously predicting a single `verb' and `noun' class label given an input trimmed video clip.

Action Recognition

Incremental Few-Shot Object Detection

no code implementations CVPR 2020 Juan-Manuel Perez-Rua, Xiatian Zhu, Timothy Hospedales, Tao Xiang

To this end we propose OpeN-ended Centre nEt (ONCE), a detector designed for incrementally learning to detect novel class objects with few examples.

Few-Shot Learning Few-Shot Object Detection +3

Efficient Progressive Neural Architecture Search

no code implementations1 Aug 2018 Juan-Manuel Perez-Rua, Moez Baccouche, Stephane Pateux

We demonstrate with experiments on the CIFAR-10 dataset that our method, denominated Efficient progressive neural architecture search (EPNAS), leads to increased search efficiency, while retaining competitiveness of found architectures.

General Classification Image Classification +1

Learning how to be robust: Deep polynomial regression

no code implementations17 Apr 2018 Juan-Manuel Perez-Rua, Tomas Crivelli, Patrick Bouthemy, Patrick Perez

We bypass the need for a tailored loss function on the regression parameters by attaching to our model a differentiable hard-wired decoder corresponding to the polynomial operation at hand.

Decoder regression +1

Determining Occlusions From Space and Time Image Reconstructions

no code implementations CVPR 2016 Juan-Manuel Perez-Rua, Tomas Crivelli, Patrick Bouthemy, Patrick Perez

With this in mind, we propose a novel approach to occlusion detection where visibility or not of a point in next frame is formulated in terms of visual reconstruction.

Cannot find the paper you are looking for? You can Submit a new open access paper.