8 code implementations • NeurIPS 2020 • Arash Vahdat, Jan Kautz
For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2. 98 to 2. 91 bits per dimension, and it produces high-quality images on CelebA HQ.
Ranked #3 on Image Generation on FFHQ 256 x 256 (bits/dimension metric)
1 code implementation • CVPR 2023 • Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello
Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.
Ranked #2 on Open-World Instance Segmentation on UVO (using extra training data)
Open Vocabulary Panoptic Segmentation Open Vocabulary Semantic Segmentation +4
2 code implementations • 12 Oct 2022 • Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis
To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes.
Ranked #1 on Point Cloud Generation on ShapeNet Airplane
5 code implementations • ICLR 2022 • Zhisheng Xiao, Karsten Kreis, Arash Vahdat
To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively.
Ranked #9 on Image Generation on CelebA-HQ 256x256
2 code implementations • 2 Nov 2022 • Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu
Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages.
Ranked #14 on Text-to-Image Generation on MS COCO
1 code implementation • NeurIPS 2021 • Arash Vahdat, Karsten Kreis, Jan Kautz
Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space, resulting in fewer network evaluations and faster sampling.
Ranked #3 on Image Generation on CIFAR-10 (FD metric)
2 code implementations • CVPR 2021 • Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov
In this work, we introduce GradInversion, using which input images from a larger batch (8 - 48 images) can also be recovered for large networks such as ResNets (50 layers), on complex datasets such as ImageNet (1000 classes, 224x224 px).
1 code implementation • 4 Dec 2023 • Ali Hatamizadeh, Jiaming Song, Guilin Liu, Jan Kautz, Arash Vahdat
In this paper, we study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT).
Ranked #4 on Image Generation on ImageNet 256x256
1 code implementation • 15 Jun 2023 • Hongkai Zheng, Weili Nie, Arash Vahdat, Anima Anandkumar
For masked training, we introduce an asymmetric encoder-decoder architecture consisting of a transformer encoder that operates only on unmasked patches and a lightweight transformer decoder on full patches.
2 code implementations • 16 May 2022 • Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar
Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
1 code implementation • 12 Feb 2023 • Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, Anima Anandkumar
We propose Image-to-Image Schr\"odinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions.
2 code implementations • NeurIPS 2020 • Jeremy Bernstein, Arash Vahdat, Yisong Yue, Ming-Yu Liu
This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions.
1 code implementation • ICLR 2022 • Tim Dockhorn, Arash Vahdat, Karsten Kreis
SGMs rely on a diffusion process that gradually perturbs the data towards a tractable distribution, while the generative model learns to denoise.
Ranked #24 on Image Generation on CIFAR-10
1 code implementation • 9 Jul 2016 • Mostafa S. Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori
In order to model both person-level and group-level dynamics, we present a 2-stage deep temporal model for the group activity recognition problem.
1 code implementation • CVPR 2016 • Moustafa Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori
In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity.
1 code implementation • 30 Sep 2022 • Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III, Anima Anandkumar
The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life.
1 code implementation • CVPR 2022 • Hongxu Yin, Arash Vahdat, Jose Alvarez, Arun Mallya, Jan Kautz, Pavlo Molchanov
A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
Ranked #34 on Efficient ViTs on ImageNet-1K (with DeiT-S)
1 code implementation • 11 Oct 2022 • Tim Dockhorn, Arash Vahdat, Karsten Kreis
Synthesis amounts to solving a differential equation (DE) defined by the learnt model.
Ranked #5 on Image Generation on AFHQV2
1 code implementation • ECCV 2020 • Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem
Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.
1 code implementation • NeurIPS 2021 • Weili Nie, Arash Vahdat, Anima Anandkumar
In compositional generation, our method excels at zero-shot generation of unseen attribute combinations.
1 code implementation • 18 Oct 2022 • Tim Dockhorn, Tianshi Cao, Arash Vahdat, Karsten Kreis
While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains.
1 code implementation • CVPR 2020 • Arash Vahdat, Arun Mallya, Ming-Yu Liu, Jan Kautz
Our framework brings the best of both worlds, and it enables us to search for architectures with both differentiable and non-differentiable criteria in one unified framework while maintaining a low search cost.
1 code implementation • ICLR 2021 • Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat
VAEBM captures the overall mode structure of the data distribution using a state-of-the-art VAE and it relies on its EBM component to explicitly exclude non-data-like regions from the model and refine the image samples.
Ranked #1 on Image Generation on Stacked MNIST
1 code implementation • 7 May 2023 • Morteza Mardani, Jiaming Song, Jan Kautz, Arash Vahdat
To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution.
1 code implementation • CVPR 2023 • Paul Micaelli, Arash Vahdat, Hongxu Yin, Jan Kautz, Pavlo Molchanov
Our Landmark DEQ (LDEQ) achieves state-of-the-art performance on the challenging WFLW facial landmark dataset, reaching $3. 92$ NME with fewer parameters and a training memory cost of $\mathcal{O}(1)$ in the number of recurrent modules.
Ranked #2 on Face Alignment on WFLW
1 code implementation • NeurIPS 2017 • Arash Vahdat
Collecting large training datasets, annotated with high-quality labels, is costly and time-consuming.
1 code implementation • ICCV 2019 • Mehran Khodabandeh, Arash Vahdat, Mani Ranjbar, William G. Macready
To adapt to the domain shift, the model is trained on the target domain using a set of noisy object bounding boxes that are obtained by a detection model trained only in the source domain.
1 code implementation • 24 Nov 2022 • Hongkai Zheng, Weili Nie, Arash Vahdat, Kamyar Azizzadenesheli, Anima Anandkumar
Diffusion models have found widespread adoption in various areas.
1 code implementation • 1 Nov 2021 • Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis
Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.
3 code implementations • ICML 2020 • Arash Vahdat, Evgeny Andriyash, William G. Macready
We extend the class of posterior models that may be learned by using undirected graphical models.
no code implementations • ICML 2018 • Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash
Training of discrete latent variable models remains challenging because passing gradient information through discrete units is difficult.
Ranked #53 on Image Generation on CIFAR-10 (bits/dimension metric)
no code implementations • NeurIPS 2018 • Arash Vahdat, Evgeny Andriyash, William G. Macready
Experiments on the MNIST and OMNIGLOT datasets show that these relaxations outperform previous discrete VAEs with Boltzmann priors.
no code implementations • CVPR 2016 • Zhiwei Deng, Arash Vahdat, Hexiang Hu, Greg Mori
As a concrete example, group activity recognition involves the interactions and relative spatial relations of a set of people in a scene.
Ranked #5 on Group Activity Recognition on Collective Activity
no code implementations • CVPR 2015 • Hossein Hajimirsadeghi, Wang Yan, Arash Vahdat, Greg Mori
Many visual recognition problems can be approached by counting instances.
no code implementations • 12 Feb 2015 • Mehran Khodabandeh, Arash Vahdat, Guang-Tong Zhou, Hossein Hajimirsadeghi, Mehrsan Javan Roshtkhari, Greg Mori, Stephen Se
We present a novel approach for discovering human interactions in videos.
no code implementations • 29 Sep 2018 • Evgeny Andriyash, Arash Vahdat, Bill Macready
In many applications we seek to maximize an expectation with respect to a distribution over discrete variables.
no code implementations • CVPR 2020 • Mostafa S. Ibrahim, Arash Vahdat, Mani Ranjbar, William G. Macready
Building a large image dataset with high-quality object masks for semantic segmentation is costly and time consuming.
no code implementations • NeurIPS 2013 • Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
We present a maximum margin framework that clusters data using latent variables.
no code implementations • NeurIPS 2012 • Weilong Yang, Yang Wang, Arash Vahdat, Greg Mori
Latent SVMs (LSVMs) are a class of powerful tools that have been successfully applied to many applications in computer vision.
no code implementations • NeurIPS 2021 • Jyoti Aneja, Alexander Schwing, Jan Kautz, Arash Vahdat
To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
Ranked #6 on Image Generation on CelebA 256x256 (FID metric)
no code implementations • 7 Jun 2021 • Sina Mohseni, Arash Vahdat, Jay Yadawa
In this paper, we propose a simple framework that leverages a shifting transformation learning setting for learning multiple shifted representations of the training set for improved OOD detection.
Ranked #9 on Anomaly Detection on Unlabeled CIFAR-10 vs CIFAR-100
no code implementations • 12 Jul 2021 • Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo Fusi, Arash Vahdat
We analyze three popular network architectures: EfficientNetV1, EfficientNetV2 and ResNeST, and achieve accuracy improvement for all models (up to $3. 0\%$) when compressing larger models to the latency level of smaller models.
no code implementations • 29 Sep 2021 • Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo Fusi, Arash Vahdat
In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization approach.
no code implementations • NeurIPS 2021 • Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis
Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.
no code implementations • 27 Sep 2018 • Evgeny Andriyash, Arash Vahdat, Bill Macready
In many applications we seek to optimize an expectation with respect to a distribution over discrete variables.
no code implementations • 28 Sep 2020 • Jyoti Aneja, Alex Schwing, Jan Kautz, Arash Vahdat
To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
no code implementations • ICCV 2023 • Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, Jan Kautz
Specifically, we propose a physics-based motion projection module that uses motion imitation in a physics simulator to project the denoised motion of a diffusion step to a physically-plausible motion.
no code implementations • 14 Feb 2023 • Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, Christopher Pal, Arash Vahdat, Anima Anandkumar
They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising.
no code implementations • 24 Sep 2023 • Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Karthik Kashinath, Jan Kautz, Mike Pritchard
Predictions of weather hazard require expensive km-scale simulations driven by coarser global inputs.
no code implementations • 6 Oct 2023 • Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi Hanson, Thomas E Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin Aji, Angela Dalton, Michael Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences.
no code implementations • 8 Jan 2024 • Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat
To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization.