1 code implementation • 15 Apr 2025 • Yeongmin Kim, Sotiris Anagnostidis, Yuming Du, Edgar Schönfeld, Jonas Kohler, Markos Georgopoulos, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu
ARD offers two key benefits: 1) it mitigates exposure bias by utilizing a predicted historical trajectory that is less susceptible to accumulated errors, and 2) it leverages the previous history of the ODE trajectory as a more effective source of coarse-grained information.
no code implementations • 27 Feb 2025 • Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim, Jonas Kohler, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Albert Pumarola, Ali Thabet, Edgar Schönfeld
Despite their remarkable performance, modern Diffusion Transformers are hindered by substantial resource requirements during inference, stemming from the fixed and large amount of compute needed for each denoising step.
no code implementations • 31 Jan 2025 • Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Edgar Schönfeld, Ali Thabet, Jonas Kohler
The performance of large language models (LLMs) is closely linked to their underlying size, leading to ever-growing networks and hence slower inference.
2 code implementations • 17 Oct 2024 • Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le, Matthew Yu, Mitesh Kumar Singh, Peizhao Zhang, Peter Vajda, Quentin Duval, Rohit Girdhar, Roshan Sumbaly, Sai Saketh Rambhatla, Sam Tsai, Samaneh Azadi, Samyak Datta, Sanyuan Chen, Sean Bell, Sharadh Ramaswamy, Shelly Sheynin, Siddharth Bhattacharya, Simran Motwani, Tao Xu, Tianhe Li, Tingbo Hou, Wei-Ning Hsu, Xi Yin, Xiaoliang Dai, Yaniv Taigman, Yaqiao Luo, Yen-Cheng Liu, Yi-Chiao Wu, Yue Zhao, Yuval Kirstain, Zecheng He, Zijian He, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu, Arun Mallya, Baishan Guo, Boris Araya, Breena Kerr, Carleigh Wood, Ce Liu, Cen Peng, Dimitry Vengertsev, Edgar Schonfeld, Elliot Blanchard, Felix Juefei-Xu, Fraylie Nord, Jeff Liang, John Hoffman, Jonas Kohler, Kaolin Fire, Karthik Sivakumar, Lawrence Chen, Licheng Yu, Luya Gao, Markos Georgopoulos, Rashel Moritz, Sara K. Sampson, Shikai Li, Simone Parmeggiani, Steve Fine, Tara Fowler, Vladan Petrovic, Yuming Du
Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation.
no code implementations • 8 May 2024 • Jonas Kohler, Albert Pumarola, Edgar Schönfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, Ali Thabet
Diffusion models are a powerful generative framework, but come with expensive inference.
no code implementations • 2 Mar 2024 • Neta Shaul, Uriel Singer, Ricky T. Q. Chen, Matthew Le, Ali Thabet, Albert Pumarola, Yaron Lipman
This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models.
no code implementations • 8 Feb 2024 • David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu
Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion.
no code implementations • 26 Dec 2023 • Jonas Kohler, Nicolas Griffiths Sanchez, Luca Cavalli, Catherine Herold, Albert Pumarola, Alberto Garcia Garcia, Ali Thabet
In this study, we propose two novel input processing paradigms for novel view synthesis (NVS) methods based on layered scene representations that significantly improve their runtime without compromising quality.
1 code implementation • 19 Dec 2023 • Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet
Our findings provide insights into the efficiency of the conditional denoising process that contribute to more practical and swift deployment of text-conditioned diffusion models.
no code implementations • 29 Oct 2023 • Neta Shaul, Juan Perez, Ricky T. Q. Chen, Ali Thabet, Albert Pumarola, Yaron Lipman
For example, a Bespoke solver for a CIFAR10 model produces samples with Fr\'echet Inception Distance (FID) of 2. 73 with 10 NFE, and gets to 1% of the Ground Truth (GT) FID (2. 59) for this model with only 20 NFE.
1 code implementation • 21 Apr 2023 • Angela Castillo, Maria Escobar, Guillaume Jeanneret, Albert Pumarola, Pablo Arbeláez, Ali Thabet, Artsiom Sanakoyeu
To the best of our knowledge, this is the first approach that uses the reverse diffusion process to model full-body tracking as a conditional sequence generation task.
1 code implementation • CVPR 2023 • Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
A particular challenge is that only a sparse tracking signal is available from standalone HMDs (Head Mounted Devices), often limited to tracking the user's head and wrists.
no code implementations • 25 Mar 2023 • Albert Pumarola, Artsiom Sanakoyeu, Lior Yariv, Ali Thabet, Yaron Lipman
Surface reconstruction has been seeing a lot of progress lately by utilizing Implicit Neural Representations (INRs).
1 code implementation • ICCV 2023 • Sara Rojas, Jesus Zarzar, Juan Camilo Perez, Artsiom Sanakoyeu, Ali Thabet, Albert Pumarola, Bernard Ghanem
Re-ReND is designed to achieve real-time performance by converting the NeRF into a representation that can be efficiently processed by standard graphics pipelines.
no code implementations • 10 Feb 2022 • Juan C. Pérez, Motasem Alfarra, Ali Thabet, Pablo Arbeláez, Bernard Ghanem
We propose a methodology for assessing and characterizing the robustness of FRMs against semantic perturbations to their input.
no code implementations • 5 Dec 2021 • Masheal Alghamdi, Qiang Fu, Ali Thabet, Wolfgang Heidrich
This paper study the reconstruction of High Dynamic Range (HDR) video from snapshot-coded LDR video.
1 code implementation • NeurIPS 2021 • Guocheng Qian, Hasan Abed Al Kader Hammoud, Guohao Li, Ali Thabet, Bernard Ghanem
We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network's accuracy.
Ranked #36 on
3D Part Segmentation
on ShapeNet-Part
1 code implementation • 12 Sep 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem
Advances in automatic Cut-type recognition can unleash new experiences in the video editing industry, such as movie analysis for education, video re-editing, virtual cinematography, machine-assisted trailer generation, machine-assisted video editing, among others.
1 code implementation • ICCV 2021 • Alejandro Pardo, Fabian Caba Heilbron, Juan León Alcázar, Ali Thabet, Bernard Ghanem
Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise.
1 code implementation • 29 Jul 2021 • Juan C. Pérez, Motasem Alfarra, Guillaume Jeanneret, Laura Rueda, Ali Thabet, Bernard Ghanem, Pablo Arbeláez
Deep learning models are prone to being fooled by imperceptible perturbations known as adversarial attacks.
1 code implementation • ICML Workshop AML 2021 • Motasem Alfarra, Juan C. Pérez, Ali Thabet, Adel Bibi, Philip H. S. Torr, Bernard Ghanem
Deep neural networks are vulnerable to small input perturbations known as adversarial attacks.
1 code implementation • ICCV 2021 • Juan León-Alcázar, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem
Active speaker detection requires a solid integration of multi-modal cues.
Active Speaker Detection
Audio-Visual Active Speaker Detection
no code implementations • 1 Jan 2021 • Guohao Li, Chenxin Xiong, Ali Thabet, Bernard Ghanem
We add our generalized aggregation into a deep GCN framework and show it achieves state-of-the-art results in six benchmarks from OGB.
no code implementations • 29 Dec 2020 • Hani Itani, Silvio Giancola, Ali Thabet, Bernard Ghanem
Since it is learnable, this mapping is allowed to be different per layer instead of being applied uniformly throughout the depth of the network.
1 code implementation • ICCV 2021 • Chen Zhao, Ali Thabet, Bernard Ghanem
We have two key components in VSGN: video self-stitching (VSS) and cross-scale graph pyramid network (xGPN).
Ranked #17 on
Temporal Action Localization
on ActivityNet-1.3
no code implementations • 24 Aug 2020 • Guohao Li, Mengmeng Xu, Silvio Giancola, Ali Thabet, Bernard Ghanem
In this paper, we introduce a new NAS framework, dubbed LC-NAS, where we search for point cloud architectures that are constrained to a target latency.
3 code implementations • 13 Jun 2020 • Guohao Li, Chenxin Xiong, Ali Thabet, Bernard Ghanem
Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs.
Ranked #1 on
Node Property Prediction
on ogbn-proteins
1 code implementation • 13 Jun 2020 • Motasem Alfarra, Juan C. Pérez, Adel Bibi, Ali Thabet, Pablo Arbeláez, Bernard Ghanem
This paper studies how encouraging semantically-aligned features during deep neural network training can increase network robustness.
1 code implementation • ECCV 2020 • Juan C. Pérez, Motasem Alfarra, Guillaume Jeanneret, Adel Bibi, Ali Thabet, Bernard Ghanem, Pablo Arbeláez
We revisit the benefits of merging classical vision concepts with deep learning models.
1 code implementation • ECCV 2020 • Abdullah Hamdi, Sara Rojas, Ali Thabet, Bernard Ghanem
Our proposed attack increases the attack success rate by up to 40% for those transferred to unseen networks (transferability), while maintaining a high success rate on the attacked network.
1 code implementation • CVPR 2020 • Guohao Li, Guocheng Qian, Itzel C. Delgadillo, Matthias Müller, Ali Thabet, Bernard Ghanem
Architecture design has become a crucial component of successful deep learning.
Ranked #4 on
Node Classification
on PPI
1 code implementation • CVPR 2021 • Guocheng Qian, Abdulellah Abualshour, Guohao Li, Ali Thabet, Bernard Ghanem
We combine Inception DenseGCN with NodeShuffle into a new point upsampling pipeline called PU-GCN.
7 code implementations • CVPR 2020 • Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, Bernard Ghanem
In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem.
Ranked #5 on
Temporal Action Localization
on EPIC-KITCHENS-100
4 code implementations • 15 Oct 2019 • Guohao Li, Matthias Müller, Guocheng Qian, Itzel C. Delgadillo, Abdulellah Abualshour, Ali Thabet, Bernard Ghanem
This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs.
Ranked #5 on
3D Semantic Segmentation
on PartNet
no code implementations • 10 Apr 2019 • Alejandro Pardo, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, Bernard Ghanem
We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation.
1 code implementation • ICCV 2019 • Guohao Li, Matthias Müller, Ali Thabet, Bernard Ghanem
Finally, we use these new concepts to build a very deep 56-layer GCN, and show how it significantly boosts performance (+3. 7% mIoU over state-of-the-art) in the task of point cloud semantic segmentation.
1 code implementation • 30 Mar 2019 • Ali Thabet, Humam Alwassel, Bernard Ghanem
In fact, we show how Morton features can be used to significantly improve performance (+3% for 2 popular semantic segmentation algorithms) in the task of semantic segmentation of point clouds on the challenging and large-scale S3DIS dataset.
1 code implementation • 30 Mar 2019 • Alejandro Pardo, Humam Alwassel, Fabian Caba Heilbron, Ali Thabet, Bernard Ghanem
RefineLoc shows competitive results with the state-of-the-art in weakly-supervised temporal localization.
Temporal Localization
Weakly Supervised Action Localization
+2
no code implementations • CVPR 2015 • Bernard Ghanem, Ali Thabet, Juan Carlos Niebles, Fabian Caba Heilbron
This paper proposes a new framework for estimating the Manhattan Frame (MF) of an indoor scene from a single RGB-D image.