1 code implementation • ECCV 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz
Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags.
no code implementations • 4 Feb 2025 • Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Chris Choy
We tackle open-vocabulary 3D scene understanding by introducing a novel data generation pipeline and training framework.
no code implementations • 21 Jan 2025 • Hongjun Wang, Wonmin Byeon, Jiarui Xu, Jinwei Gu, Ka Chun Cheung, Xiaolong Wang, Kai Han, Jan Kautz, Sifei Liu
We present the Generalized Spatial Propagation Network (GSPN), a new attention mechanism optimized for vision tasks that inherently captures 2D spatial structures.
1 code implementation • 20 Jan 2025 • Zhiqi Li, Guo Chen, Shilong Liu, Shihao Wang, Vibashan VS, Yishen Ji, Shiyi Lan, Hao Zhang, Yilin Zhao, Subhashree Radhakrishnan, Nadine Chang, Karan Sapra, Amala Sanjay Deshmukh, Tuomas Rintamaki, Matthieu Le, Ilia Karmanov, Lukas Voegtle, Philipp Fischer, De-An Huang, Timo Roman, Tong Lu, Jose M. Alvarez, Bryan Catanzaro, Jan Kautz, Andrew Tao, Guilin Liu, Zhiding Yu
Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models.
1 code implementation • 17 Jan 2025 • Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield
However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching.
no code implementations • 12 Dec 2024 • Xueting Li, Ye Yuan, Shalini De Mello, Gilles Daviet, Jonathan Leaf, Miles Macklin, Jan Kautz, Umar Iqbal
Specifically, we first employ three text-conditioned 3D generative models to generate garment mesh, body shape and hair strands from the given text prompt.
no code implementations • 11 Dec 2024 • Jihao Liu, Zhiding Yu, Shiyi Lan, Shihao Wang, Rongyao Fang, Jan Kautz, Hongsheng Li, Jose M. Alvare
This paper presents StreamChat, a novel approach that enhances the interaction capabilities of Large Multimodal Models (LMMs) with streaming video content.
1 code implementation • 10 Dec 2024 • Greg Heinrich, Mike Ranzinger, Hongxu, Yin, Yao Lu, Jan Kautz, Andrew Tao, Bryan Catanzaro, Pavlo Molchanov
Agglomerative models have recently emerged as a powerful approach to training vision foundation models, leveraging multi-teacher distillation from existing models such as CLIP, DINO, and SAM.
3 code implementations • 9 Dec 2024 • Songlin Yang, Jan Kautz, Ali Hatamizadeh
Linear Transformers have gained attention as efficient alternatives to standard Transformers, but their performance in retrieval and long-context tasks has been limited.
2 code implementations • 5 Dec 2024 • Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Vishwesh Nath, Jinyi Hu, Sifei Liu, Ranjay Krishna, Daguang Xu, Xiaolong Wang, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu
This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy.
no code implementations • 5 Dec 2024 • An-Chieh Cheng, Yandong Ji, Zhaojing Yang, Xueyan Zou, Jan Kautz, Erdem Biyik, Hongxu Yin, Sifei Liu, Xiaolong Wang
This paper proposes to solve the problem of Vision-and-Language Navigation with legged robots, which not only provides a flexible way for humans to command but also allows the robot to navigate through more challenging and cluttered scenes.
no code implementations • 20 Nov 2024 • Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Shih-Yang Liu, Matthijs Van Keirsbilck, Min-Hung Chen, Yoshi Suhara, Yingyan Lin, Jan Kautz, Pavlo Molchanov
We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficiency.
no code implementations • 28 Oct 2024 • Shih-Yang Liu, Huck Yang, Chien-Yi Wang, Nai Chit Fung, Hongxu Yin, Charbel Sakr, Saurav Muralidharan, Kwang-Ting Cheng, Jan Kautz, Yu-Chiang Frank Wang, Pavlo Molchanov, Min-Hung Chen
In this work, we re-formulate the model compression problem into the customized compensation problem: Given a compressed model, we aim to introduce residual low-rank paths to compensate for compression errors under customized requirements from users (e. g., tasks, compression ratios), resulting in greater flexibility in adjusting overall capacity without being constrained by specific compression formats.
1 code implementation • 15 Oct 2024 • Daniel Lichy, Hang Su, Abhishek Badki, Jan Kautz, Orazio Gallo
We introduce nvTorchCam, an open-source library under the Apache 2. 0 license, designed to make deep learning algorithms camera model-independent.
no code implementations • 9 Oct 2024 • Shoaib Ahmed Siddiqui, Jean Kossaifi, Boris Bonev, Christopher Choy, Jan Kautz, David Krueger, Kamyar Azizzadenesheli
Interestingly, we see a strong positive effect of using a larger dataset when training a smaller model as compared to training on a smaller dataset for longer.
1 code implementation • 26 Sep 2024 • Gongfan Fang, Hongxu Yin, Saurav Muralidharan, Greg Heinrich, Jeff Pool, Jan Kautz, Pavlo Molchanov, Xinchao Wang
This approach facilitates end-to-end training on large-scale datasets and offers two notable advantages: 1) High-quality Masks - our method effectively scales to large datasets and learns accurate masks; 2) Transferability - the probabilistic modeling of mask distribution enables the transfer learning of sparsity across domains or tasks.
no code implementations • 29 Aug 2024 • Jiefeng Li, Ye Yuan, Davis Rempe, Haotian Zhang, Pavlo Molchanov, Cewu Lu, Jan Kautz, Umar Iqbal
Experiments on three challenging benchmarks demonstrate the effectiveness of COIN, which outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
2 code implementations • 28 Aug 2024 • Min Shi, Fuxiao Liu, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu, Guilin Liu
We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies.
no code implementations • 21 Aug 2024 • Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Ameya Sunil Mahabaleshwarkar, Gerald Shen, Jiaqi Zeng, Zijia Chen, Yoshi Suhara, Shizhe Diao, Chenhan Yu, Wei-Chun Chen, Hayley Ross, Oluwatobi Olabiyi, Ashwath Aithal, Oleksii Kuchaiev, Daniel Korzekwa, Pavlo Molchanov, Mostofa Patwary, Mohammad Shoeybi, Jan Kautz, Bryan Catanzaro
We present a comprehensive report on compressing the Llama 3. 1 8B and Mistral NeMo 12B models to 4B and 8B parameters, respectively, using pruning and distillation.
1 code implementation • 19 Aug 2024 • Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han
We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing.
no code implementations • 24 Jul 2024 • Yunhao Fang, Ligeng Zhu, Yao Lu, Yan Wang, Pavlo Molchanov, Jan Kautz, Jang Hyun Cho, Marco Pavone, Song Han, Hongxu Yin
In the self-augment step, the instruction-finetuned VLM recaptions its pretraining caption datasets and then retrains from scratch leveraging refined data.
Ranked #63 on
Visual Question Answering
on MM-Vet
1 code implementation • 23 Jul 2024 • Shoaib Ahmed Siddiqui, Xin Dong, Greg Heinrich, Thomas Breuel, Jan Kautz, David Krueger, Pavlo Molchanov
Large Language Models (LLMs) are not only resource-intensive to train but even more costly to deploy in production.
1 code implementation • 19 Jul 2024 • Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov
Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive.
3 code implementations • 10 Jul 2024 • Ali Hatamizadeh, Jan Kautz
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications.
Ranked #246 on
Image Classification
on ImageNet
1 code implementation • 12 Jun 2024 • Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, Bryan Catanzaro
On an additional 23 long-context tasks, the hybrid model continues to closely match or exceed the Transformer on average.
2 code implementations • 11 Jun 2024 • Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez
We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model.
no code implementations • 11 Jun 2024 • Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov
Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical.
no code implementations • 4 Jun 2024 • Dejia Xu, Weili Nie, Chao Liu, Sifei Liu, Jan Kautz, Zhangyang Wang, Arash Vahdat
Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users.
1 code implementation • 3 Jun 2024 • An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, Sifei Liu
Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks.
no code implementations • 29 May 2024 • Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin
We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities.
1 code implementation • 2 May 2024 • Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez
We further propose OmniDrive-nuScenes, a new visual question-answering dataset challenging the true 3D situational awareness of a model with comprehensive visual question-answering (VQA) tasks, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning.
1 code implementation • 27 Mar 2024 • De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan Kautz
In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task.
no code implementations • 24 Jan 2024 • Daniel Lichy, Hang Su, Abhishek Badki, Jan Kautz, Orazio Gallo
Unfortunately, most of the GT data is for pinhole cameras, making it impossible to properly train depth estimation models for large-FoV cameras.
1 code implementation • CVPR 2024 • Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov
A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.
no code implementations • CVPR 2024 • Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal
Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations.
1 code implementation • CVPR 2024 • Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield
We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups.
1 code implementation • CVPR 2024 • Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang
While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses.
3 code implementations • CVPR 2024 • Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han
Visual language models (VLMs) rapidly progressed with the recent success of large language models.
Ranked #4 on
Zero-Shot Video Question Answer
on MSVD-QA
2 code implementations • 10 Dec 2023 • Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov
A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks.
1 code implementation • CVPR 2024 • Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez
We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity.
1 code implementation • 4 Dec 2023 • Ali Hatamizadeh, Jiaming Song, Guilin Liu, Jan Kautz, Arash Vahdat
In this paper, we study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT).
Ranked #17 on
Image Generation
on ImageNet 256x256
1 code implementation • 30 Oct 2023 • Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz
Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance.
no code implementations • 20 Oct 2023 • Muhammed Kocabas, Ye Yuan, Pavlo Molchanov, Yunrong Guo, Michael J. Black, Otmar Hilliges, Jan Kautz, Umar Iqbal
This design combines the strengths of SLAM and motion priors, which leads to significant improvements in human and camera motion estimation.
2 code implementations • 3 Oct 2023 • Batu Ozturkler, Chao Liu, Benjamin Eckart, Morteza Mardani, Jiaming Song, Jan Kautz
However, diffusion models require careful tuning of inference hyperparameters on a validation set and are still sensitive to distribution shifts during testing.
no code implementations • 26 Sep 2023 • Yang Fu, Shalini De Mello, Xueting Li, Amey Kulkarni, Jan Kautz, Xiaolong Wang, Sifei Liu
NFP not only demonstrates SOTA scene reconstruction performance and efficiency, but it also supports single-image novel-view synthesis, which is underexplored in neural fields.
no code implementations • 24 Sep 2023 • Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, Mike Pritchard
The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
no code implementations • 29 Aug 2023 • Yazhou Xing, Amrita Mazumdar, Anjul Patney, Chao Liu, Hongxu Yin, Qifeng Chen, Jan Kautz, Iuri Frosio
We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging.
1 code implementation • 4 Jul 2023 • Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez
This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop.
Ranked #2 on
Prediction Of Occupancy Grid Maps
on Occ3D-nuScenes
no code implementations • CVPR 2023 • Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov
We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures.
no code implementations • 14 Jun 2023 • Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, Jan Kautz
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.
2 code implementations • 9 Jun 2023 • Ali Hatamizadeh, Greg Heinrich, Hongxu Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov
At a high level, global self-attentions enable the efficient cross-window communication at lower costs.
Ranked #197 on
Image Classification
on ImageNet
1 code implementation • CVPR 2023 • Jiashun Wang, Xueting Li, Sifei Liu, Shalini De Mello, Orazio Gallo, Xiaolong Wang, Jan Kautz
We present a zero-shot approach that requires only the widely available deformed non-stylized avatars in training, and deforms stylized characters of significantly different shapes at inference.
1 code implementation • CVPR 2023 • Iuri Frosio, Jan Kautz
Many defenses against adversarial attacks (\eg robust classifiers, randomization, or image purification) use countermeasures put to work only after the attack has been crafted.
1 code implementation • 7 May 2023 • Morteza Mardani, Jiaming Song, Jan Kautz, Arash Vahdat
To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution.
no code implementations • 4 May 2023 • Connor Z. Lin, Koki Nagano, Jan Kautz, Eric R. Chan, Umar Iqbal, Leonidas Guibas, Gordon Wetzstein, Sameh Khamis
To tackle this problem, we propose a novel method for constructing implicit 3D morphable face models that are both generalizable and intuitive for editing.
1 code implementation • CVPR 2023 • Paul Micaelli, Arash Vahdat, Hongxu Yin, Jan Kautz, Pavlo Molchanov
Our Landmark DEQ (LDEQ) achieves state-of-the-art performance on the challenging WFLW facial landmark dataset, reaching $3. 92$ NME with fewer parameters and a training memory cost of $\mathcal{O}(1)$ in the number of recurrent modules.
Ranked #2 on
Face Alignment
on WFLW
1 code implementation • CVPR 2023 • Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, Stan Birchfield
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object.
no code implementations • 14 Feb 2023 • Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, Christopher Pal, Arash Vahdat, Anima Anandkumar
They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising.
no code implementations • ICCV 2023 • Umar Iqbal, Akin Caliskan, Koki Nagano, Sameh Khamis, Pavlo Molchanov, Jan Kautz
We propose RANA, a relightable and articulated neural avatar for the photorealistic synthesis of humans under arbitrary viewpoints, body poses, and lighting.
no code implementations • ICCV 2023 • Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, Jan Kautz
Specifically, we propose a physics-based motion projection module that uses motion imitation in a physics simulator to project the denoised motion of a diffusion step to a physically-plausible motion.
no code implementations • 21 Sep 2022 • Yu-Ying Yeh, Koki Nagano, Sameh Khamis, Jan Kautz, Ming-Yu Liu, Ting-Chun Wang
An effective approach is to supervise the training of deep neural networks with a high-fidelity dataset of desired input-output pairs, captured with a light stage.
no code implementations • 19 Aug 2022 • Zian Wang, Wenzheng Chen, David Acuna, Jan Kautz, Sanja Fidler
In this work, we propose a neural approach that estimates the 5D HDR light field from a single image, and a differentiable object insertion formulation that enables end-to-end training with image-based losses that encourage realism.
8 code implementations • 20 Jun 2022 • Ali Hatamizadeh, Hongxu Yin, Greg Heinrich, Jan Kautz, Pavlo Molchanov
Pre-trained GC ViT backbones in downstream tasks of object detection, instance segmentation, and semantic segmentation using MS COCO and ADE20K datasets outperform prior work consistently.
Ranked #135 on
Semantic Segmentation
on ADE20K
no code implementations • 14 May 2022 • Jonathan Tremblay, Moustafa Meshry, Alex Evans, Jan Kautz, Alexander Keller, Sameh Khamis, Thomas Müller, Charles Loop, Nathan Morrical, Koki Nagano, Towaki Takikawa, Stan Birchfield
We present a large-scale synthetic dataset for novel view synthesis consisting of ~300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 x 1600 pixels).
Ranked #1 on
Novel View Synthesis
on RTMV
1 code implementation • CVPR 2022 • Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu
We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i. e., the correspondence map, which describes the structure (e. g., the shape of a face), is controlled via a transformation.
no code implementations • 29 Mar 2022 • Amit Raj, Umar Iqbal, Koki Nagano, Sameh Khamis, Pavlo Molchanov, James Hays, Jan Kautz
In this work, we present, DRaCoN, a framework for learning full-body volumetric avatars which exploits the advantages of both the 2D and 3D neural rendering techniques.
no code implementations • CVPR 2022 • Ali Hatamizadeh, Hongxu Yin, Holger Roth, Wenqi Li, Jan Kautz, Daguang Xu, Pavlo Molchanov
In this work we demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks.
1 code implementation • CVPR 2022 • Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez
FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.
no code implementations • 24 Feb 2022 • Benjamin Wu, Oliver Hennigh, Jan Kautz, Sanjay Choudhry, Wonmin Byeon
This efficiently and flexibly produces a compressed representation which is used for additional conditioning of physics-informed models.
5 code implementations • CVPR 2022 • Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang
With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.
no code implementations • 14 Feb 2022 • Ali Hatamizadeh, Hongxu Yin, Pavlo Molchanov, Andriy Myronenko, Wenqi Li, Prerna Dogra, Andrew Feng, Mona G. Flores, Jan Kautz, Daguang Xu, Holger R. Roth
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
no code implementations • 20 Jan 2022 • Or Litany, Haggai Maron, David Acuna, Jan Kautz, Gal Chechik, Sanja Fidler
Standard Federated Learning (FL) techniques are limited to clients with identical network architectures.
1 code implementation • CVPR 2022 • Hongxu Yin, Arash Vahdat, Jose Alvarez, Arun Mallya, Jan Kautz, Pavlo Molchanov
A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
Ranked #34 on
Efficient ViTs
on ImageNet-1K (with DeiT-S)
1 code implementation • CVPR 2022 • Ye Yuan, Umar Iqbal, Pavlo Molchanov, Kris Kitani, Jan Kautz
Since the joint reconstruction of human motions and camera poses is underconstrained, we propose a global trajectory predictor that generates global human trajectories based on local body movements.
Ranked #1 on
Global 3D Human Pose Estimation
on EMDB
no code implementations • NeurIPS 2021 • Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
It is therefore interesting to study how these two tasks can be coupled to benefit each other.
no code implementations • ICLR 2022 • Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu
We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.
no code implementations • ICCV 2021 • Siva Karthik Mustikovela, Shalini De Mello, Aayush Prakash, Umar Iqbal, Sifei Liu, Thu Nguyen-Phuoc, Carsten Rother, Jan Kautz
We present SSOD, the first end-to-end analysis-by synthesis framework with controllable GANs for the task of self-supervised object detection.
1 code implementation • CVPR 2023 • Huanrui Yang, Hongxu Yin, Maying Shen, Pavlo Molchanov, Hai Li, Jan Kautz
This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks and between different structures within the block via the first systematic attempt on global structural pruning.
no code implementations • 29 Sep 2021 • Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo Fusi, Arash Vahdat
In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization approach.
no code implementations • 22 Sep 2021 • Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang
Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale.
no code implementations • ICCV 2021 • Zian Wang, Jonah Philion, Sanja Fidler, Jan Kautz
In this paper, we propose a unified, learning-based inverse rendering framework that formulates 3D spatially-varying lighting.
no code implementations • 13 Jul 2021 • Xin Dong, Hongxu Yin, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov, H. T. Kung
Prior works usually assume that SC offers privacy benefits as only intermediate features, instead of private data, are shared from devices to the cloud.
no code implementations • 12 Jul 2021 • Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo Fusi, Arash Vahdat
We analyze three popular network architectures: EfficientNetV1, EfficientNetV2 and ResNeST, and achieve accuracy improvement for all models (up to $3. 0\%$) when compressing larger models to the latency level of smaller models.
no code implementations • CVPR 2021 • Benjamin Eckart, Wentao Yuan, Chao Liu, Jan Kautz
In this work, we introduce a general method for 3D self-supervised representation learning that 1) remains agnostic to the underlying neural network architecture, and 2) specifically leverages the geometric nature of 3D point cloud data.
no code implementations • 10 Jun 2021 • Adrian Spurr, Pavlo Molchanov, Umar Iqbal, Jan Kautz, Otmar Hilliges
Hand pose estimation is difficult due to different environmental conditions, object- and self-occlusion as well as diversity in hand shape and appearance.
1 code implementation • NeurIPS 2021 • Arash Vahdat, Karsten Kreis, Jan Kautz
Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space, resulting in fewer network evaluations and faster sampling.
Ranked #4 on
Image Generation
on CelebA 256x256
2 code implementations • CVPR 2021 • Rakshit Kothari, Shalini De Mello, Umar Iqbal, Wonmin Byeon, Seonwook Park, Jan Kautz
A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios.
Ranked #3 on
Gaze Estimation
on Gaze360
no code implementations • 27 Apr 2021 • Umar Iqbal, Kevin Xie, Yunrong Guo, Jan Kautz, Pavlo Molchanov
We present KAMA, a 3D Keypoint Aware Mesh Articulation approach that allows us to estimate a human body mesh from the positions of 3D body keypoints.
Ranked #57 on
3D Human Pose Estimation
on 3DPW
1 code implementation • CVPR 2021 • Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov
In this work, we introduce GradInversion, using which input images from a larger batch (8 - 48 images) can also be recovered for large networks such as ResNets (50 layers), on complex datasets such as ImageNet (1000 classes, 224x224 px).
2 code implementations • CVPR 2021 • Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, Dieter Fox
We introduce DexYCB, a new dataset for capturing hand grasping of objects.
no code implementations • CVPR 2021 • Yang Fu, Sifei Liu, Umar Iqbal, Shalini De Mello, Humphrey Shi, Jan Kautz
Tracking segmentation masks of multiple instances has been intensively studied, but still faces two fundamental challenges: 1) the requirement of large-scale, frame-wise annotation, and 2) the complexity of two-stage approaches.
no code implementations • 29 Jan 2021 • Jae Shin Yoon, Kihwan Kim, Jan Kautz, Hyun Soo Park
In this paper, we present a method of clothes retargeting; generating the potential poses and deformations of a given 3D clothing template model to fit onto a person in a single RGB image.
1 code implementation • CVPR 2021 • Abhishek Badki, Orazio Gallo, Jan Kautz, Pradeep Sen
Time-to-contact (TTC), the time for an object to collide with the observer's plane, is a powerful tool for path planning: it is potentially more informative than the depth, velocity, and acceleration of objects in the scene -- even for humans.
no code implementations • ICLR 2021 • Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz, Yale Song
The recent success of Transformers in the language domain has motivated adapting it to a multimodal setting, where a new visual model is trained in tandem with an already pretrained language model.
no code implementations • NeurIPS 2020 • Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz
This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.
no code implementations • 1 Dec 2020 • Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai, Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz
A common way to speed up the computation is to downsample the feature volume, but this loses high-frequency details.
no code implementations • 21 Oct 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz
Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags.
no code implementations • NeurIPS 2021 • Jyoti Aneja, Alexander Schwing, Jan Kautz, Arash Vahdat
To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
Ranked #6 on
Image Generation
on CelebA 256x256
(FID metric)
1 code implementation • ICLR 2021 • Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat
VAEBM captures the overall mode structure of the data distribution using a state-of-the-art VAE and it relies on its EBM component to explicitly exclude non-data-like regions from the model and refine the image samples.
Ranked #1 on
Image Generation
on Stacked MNIST
no code implementations • 28 Sep 2020 • Jyoti Aneja, Alex Schwing, Jan Kautz, Arash Vahdat
To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
no code implementations • 25 Aug 2020 • Jialiang Wang, Varun Jampani, Deqing Sun, Charles Loop, Stan Birchfield, Jan Kautz
End-to-end deep learning methods have advanced stereo vision in recent years and obtained excellent results when the training and test data are similar.
2 code implementations • ECCV 2020 • Wentao Yuan, Ben Eckart, Kihwan Kim, Varun Jampani, Dieter Fox, Jan Kautz
Point cloud registration is a fundamental problem in 3D computer vision, graphics and robotics.
1 code implementation • ECCV 2020 • Yang Zou, Xiaodong Yang, Zhiding Yu, B. V. K. Vijaya Kumar, Jan Kautz
To this end, we propose a joint learning framework that disentangles id-related/unrelated features and enforces adaptation to work on the id-related feature space exclusively.
Ranked #7 on
Unsupervised Domain Adaptation
on Market to MSMT
no code implementations • 20 Jul 2020 • Xitong Yang, Xiaodong Yang, Sifei Liu, Deqing Sun, Larry Davis, Jan Kautz
Thus, the motion features at higher levels are trained to gradually capture semantic dynamics and evolve more discriminative for action recognition.
9 code implementations • NeurIPS 2020 • Arash Vahdat, Jan Kautz
For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2. 98 to 2. 91 bits per dimension, and it produces high-quality images on CelebA HQ.
Ranked #3 on
Image Generation
on FFHQ 256 x 256
(bits/dimension metric)
1 code implementation • ECCV 2020 • Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem
Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.
1 code implementation • CVPR 2020 • Abhishek Badki, Alejandro Troccoli, Kihwan Kim, Jan Kautz, Pradeep Sen, Orazio Gallo
Given a strict time budget, Bi3D can detect objects closer than a given distance in as little as a few milliseconds, or estimate depth with arbitrarily coarse quantization, with complexity linear with the number of quantization levels.
no code implementations • 14 May 2020 • Mengyuan Yan, Qingyun Sun, Iuri Frosio, Stephen Tyree, Jan Kautz
Combining the control policy learned from simulation with the perception model, we achieve an impressive $\bf{88\%}$ success rate in grasping a tiny sphere with a real robot.
Robotics
2 code implementations • CVPR 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G. Schwing, Jan Kautz
Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training.
Ranked #1 on
Weakly Supervised Object Detection
on COCO test-dev
2 code implementations • CVPR 2020 • Siva Karthik Mustikovela, Varun Jampani, Shalini De Mello, Sifei Liu, Umar Iqbal, Carsten Rother, Jan Kautz
Training deep neural networks to estimate the viewpoint of objects requires large labeled training datasets.
no code implementations • CVPR 2020 • Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, Jan Kautz
Our insight is that although its scale and quality are inconsistent with other views, the depth estimation from a single view can be used to reason about the globally coherent geometry of dynamic contents.
1 code implementation • CVPR 2020 • Mark Boss, Varun Jampani, Kihwan Kim, Hendrik P. A. Lensch, Jan Kautz
Extensive experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
no code implementations • ECCV 2020 • Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Otmar Hilliges, Jan Kautz
Estimating 3D hand pose from 2D images is a difficult, inverse problem due to the inherent scale and depth ambiguities.
Ranked #10 on
3D Hand Pose Estimation
on DexYCB
no code implementations • 18 Mar 2020 • Kaan Akşit, Jan Kautz, David Luebke
We introduce a new gaze tracker for Head Mounted Displays (HMDs).
no code implementations • CVPR 2020 • Umar Iqbal, Pavlo Molchanov, Jan Kautz
One major challenge for monocular 3D human pose estimation in-the-wild is the acquisition of training data that contains unconstrained images annotated with accurate 3D poses.
Monocular 3D Human Pose Estimation
Weakly-superavised 3D Human Pose Estimation
+1
1 code implementation • ECCV 2020 • Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz
To the best of our knowledge, we are the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints.
2 code implementations • NeurIPS 2020 • Jiahao Su, Wonmin Byeon, Jean Kossaifi, Furong Huang, Jan Kautz, Animashree Anandkumar
Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks such as long-term forecasting.
Ranked #1 on
Video Prediction
on KTH
(Cond metric)
no code implementations • WS 2020 • Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz
Specifically, we decompose the latent representation of the input sentence to a style code that captures the language style variation and a content code that encodes the language style-independent content.
no code implementations • 19 Jan 2020 • Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang
Specifically, we first use a coarse deblurring network to reduce the motion blur on the input face image.
1 code implementation • 7 Jan 2020 • Tailin Wu, Thomas Breuel, Michael Skuhersky, Jan Kautz
Identifying the underlying directional relations from observational time series with nonlinear interactions and complex relational structures is key to a wide range of applications, yet remains a hard problem.
1 code implementation • CVPR 2020 • Abhishek Badki, Orazio Gallo, Jan Kautz, Pradeep Sen
Meshlets act as a dictionary of local features and thus allow to use learned priors to reconstruct object meshes in any pose and from unseen classes, even when the noise is large and the samples sparse.
2 code implementations • CVPR 2020 • Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz
We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network.
1 code implementation • CVPR 2020 • Arash Vahdat, Arun Mallya, Ming-Yu Liu, Jan Kautz
Our framework brings the best of both worlds, and it enables us to search for architectures with both differentiable and non-differentiable criteria in one unified framework while maintaining a low search cost.
no code implementations • ICML 2020 • Beidi Chen, Weiyang Liu, Zhiding Yu, Jan Kautz, Anshumali Shrivastava, Animesh Garg, Anima Anandkumar
We also find that AVH has a statistically significant correlation with human visual hardness.
2 code implementations • NeurIPS 2019 • Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz
In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.
Ranked #3 on
Motion Synthesis
on BRACE
6 code implementations • NeurIPS 2019 • Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro
To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time.
Ranked #1 on
Video-to-Video Synthesis
on YouTube Dancing
1 code implementation • ICCV 2019 • Huaizu Jiang, Deqing Sun, Varun Jampani, Zhaoyang Lv, Erik Learned-Miller, Jan Kautz
We introduce a compact network for holistic scene flow estimation, called SENSE, which shares common encoder features among four closely-related tasks: optical flow estimation, disparity estimation from stereo, occlusion estimation, and semantic segmentation.
2 code implementations • NeurIPS 2019 • Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang
Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames.
no code implementations • 25 Sep 2019 • Wonmin Byeon, Jan Kautz
While video prediction approaches have advanced considerably in recent years, learning to predict long-term future is challenging — ambiguous future or error propagation over time yield blurry predictions.
no code implementations • 25 Sep 2019 • Jiahao Su, Wonmin Byeon, Furong Huang, Jan Kautz, Animashree Anandkumar
Long-term video prediction is highly challenging since it entails simultaneously capturing spatial and temporal information across a long range of image frames. Standard recurrent models are ineffective since they are prone to error propagation and cannot effectively capture higher-order correlations.
no code implementations • ICCV 2019 • Sifei Liu, Xueting Li, Varun Jampani, Shalini De Mello, Jan Kautz
We experiment with semantic segmentation networks, where we use our propagation module to jointly train on different data -- images, superpixels and point clouds.
no code implementations • 31 Jul 2019 • Wei-Sheng Lai, Orazio Gallo, Jinwei Gu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
Despite the long history of image and video stitching research, existing academic and commercial solutions still produce strong artifacts.
3 code implementations • CVPR 2019 • Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz
On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0. 02% in the top-1 accuracy on ImageNet.
1 code implementation • ICCV 2019 • Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model.
Ranked #1 on
Video Frame Interpolation
on UCF101
(PSNR (sRGB) metric)
no code implementations • 13 May 2019 • Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Stan Birchfield, Ming-Hsuan Yang, Jan Kautz
Through extensive experimentation on the ObjectNet3D and Pascal3D+ benchmark datasets, we demonstrate that our framework, which we call MetaView, significantly outperforms fine-tuning the state-of-the-art models with few examples, and that the specific architectural innovations of our method are crucial to achieving good performance.
1 code implementation • ICCV 2019 • Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, Jan Kautz
Inter-personal anatomical differences limit the accuracy of person-independent gaze estimation networks.
Ranked #1 on
Gaze Estimation
on MPII Gaze
(using extra training data)
10 code implementations • ICCV 2019 • Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz
Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images.
1 code implementation • CVPR 2019 • Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz
Parts provide a good intermediate representation of objects that is robust with respect to the camera, pose and appearance variations.
Ranked #4 on
Unsupervised Keypoint Estimation
on CUB
no code implementations • ICLR 2019 • Tailin Wu, Thomas Breuel, Jan Kautz
Learning causal relations from observational time series with nonlinear interactions and complex causal structures is a key component of human intelligence, and has a wide range of applications.
1 code implementation • CVPR 2019 • Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz
In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector---a progressive learning framework for spatio-temporal action detection in videos.
Ranked #7 on
Action Detection
on UCF101-24
12 code implementations • CVPR 2019 • Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, Jan Kautz
To this end, we propose a joint learning framework that couples re-id learning and data generation end-to-end.