1 code implementation • 6 Mar 2025 • Hannes Stark, Bowen Jing, Tomas Geffner, Jason Yim, Tommi Jaakkola, Arash Vahdat, Karsten Kreis
We develop ProtComposer to generate protein structures conditioned on spatial protein layouts that are specified via a set of 3D ellipsoids capturing substructure shapes and semantics.
1 code implementation • 2 Mar 2025 • Tomas Geffner, Kieran Didi, Zuobai Zhang, Danny Reidenbach, Zhonglin Cao, Jason Yim, Mario Geiger, Christian Dallago, Emine Kucukbenli, Arash Vahdat, Karsten Kreis
Here, we develop Proteina, a new large-scale flow-based protein backbone generator that utilizes hierarchical fold class labels for conditioning and relies on a tailored scalable transformer architecture with up to 5x as many parameters as previous models.
no code implementations • 21 Feb 2025 • Yilun Xu, Weili Nie, Arash Vahdat
This weighting function naturally emphasizes samples with higher density in the teacher distribution, when using a less mode-seeking divergence.
no code implementations • 20 Jan 2025 • Zhifeng Kong, Kevin J Shih, Weili Nie, Arash Vahdat, Sang-gil Lee, Joao Felipe Santos, Ante Jukic, Rafael Valle, Bryan Catanzaro
Audio in the real world may be perturbed due to numerous factors, causing the audio quality to be degraded.
no code implementations • 13 Jan 2025 • Weixi Feng, Chao Liu, Sifei Liu, William Yang Wang, Arash Vahdat, Weili Nie
In addition, we introduce a learnable module to interpolate text embeddings so that users can control semantics in specific frames and obtain smooth object transitions.
no code implementations • 10 Jan 2025 • Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Yuxing Peng, Saee Paliwal, Weili Nie, Arash Vahdat
Drug discovery is a complex process that involves multiple scenarios and stages, such as fragment-constrained molecule generation, hit generation and lead optimization.
no code implementations • 18 Nov 2024 • Seul Lee, Karsten Kreis, Srimukh Prasad Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Arash Vahdat, Weili Nie
Fragment-based drug discovery, in which molecular fragments are assembled into new molecules with desirable biochemical properties, has achieved great success.
no code implementations • 28 Oct 2024 • Minkai Xu, Tomas Geffner, Karsten Kreis, Weili Nie, Yilun Xu, Jure Leskovec, Stefano Ermon, Arash Vahdat
Unfortunately, these models still underperform the autoregressive counterparts, with the performance gap increasing when reducing the number of sampling steps.
no code implementations • 21 Oct 2024 • Giannis Daras, Weili Nie, Karsten Kreis, Alex Dimakis, Morteza Mardani, Nikola Borislavov Kovachki, Arash Vahdat
This perspective allows us to train function space diffusion models only on images and utilize them to solve temporally correlated inverse problems.
no code implementations • 18 Oct 2024 • Sangyun Lee, Yilun Xu, Tomas Geffner, Giulia Fanti, Karsten Kreis, Arash Vahdat, Weili Nie
Consistency models have recently been introduced to accelerate sampling from diffusion models by directly predicting the solution (i. e., data) of the probability flow ODE (PF ODE) from initial noise.
Ranked #13 on
Image Generation
on ImageNet 64x64
no code implementations • 18 Oct 2024 • Kushagra Pandey, Jaideep Pathak, Yilun Xu, Stephan Mandt, Michael Pritchard, Arash Vahdat, Morteza Mardani
We address this by repurposing the diffusion framework for heavy-tail estimation using multivariate Student-t distributions.
no code implementations • 17 Oct 2024 • Stathi Fotiadis, Noah Brenowitz, Tomas Geffner, Yair Cohen, Michael Pritchard, Arash Vahdat, Morteza Mardani
Conditioning diffusion and flow models have proven effective for super-resolving small-scale details in natural images. However, in physical sciences such as weather, super-resolving small-scale details poses significant challenges due to: (i) misalignment between input and output distributions (i. e., solutions to distinct partial differential equations (PDEs) follow different trajectories), (ii) multi-scale dynamics, deterministic dynamics at large scales vs. stochastic at small scales, and (iii) limited data, increasing the risk of overfitting.
no code implementations • 9 Sep 2024 • Rohit Jena, Ali Taghibakhshi, Sahil Jain, Gerald Shen, Nima Tajbakhsh, Arash Vahdat
To address this, recent approaches have incorporated human preference datasets to fine-tune T2I models or to optimize reward functions that capture these preferences.
no code implementations • 20 Aug 2024 • Jaideep Pathak, Yair Cohen, Piyush Garg, Peter Harrington, Noah Brenowitz, Dale Durran, Morteza Mardani, Arash Vahdat, Shaoming Xu, Karthik Kashinath, Michael Pritchard
Storm-scale convection-allowing models (CAMs) are an important tool for predicting the evolution of thunderstorms and mesoscale convective systems that result in damaging extreme weather.
1 code implementation • 3 Jul 2024 • Yilun Xu, Gabriele Corso, Tommi Jaakkola, Arash Vahdat, Karsten Kreis
The discrete latents significantly simplify learning the DM's complex noise-to-data mapping by reducing the curvature of the DM's generative ODE.
Ranked #3 on
Image Generation
on ImageNet 128x128
1 code implementation • 1 Jul 2024 • Siyi Gu, Minkai Xu, Alexander Powers, Weili Nie, Tomas Geffner, Karsten Kreis, Jure Leskovec, Arash Vahdat, Stefano Ermon
AliDiff shifts the target-conditioned chemical distribution towards regions with higher binding affinity and structural rationality, specified by user-defined reward functions, via the preference optimization approach.
no code implementations • 4 Jun 2024 • Dejia Xu, Weili Nie, Chao Liu, Sifei Liu, Jan Kautz, Zhangyang Wang, Arash Vahdat
Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users.
no code implementations • 3 Jun 2024 • Omri Avrahami, Rinon Gal, Gal Chechik, Ohad Fried, Dani Lischinski, Arash Vahdat, Weili Nie
In this work, we propose a training-free method, dubbed DiffUHaul, that harnesses the spatial understanding of a localized text-to-image model, for the object dragging task.
no code implementations • 14 May 2024 • Weili Nie, Sifei Liu, Morteza Mardani, Chao Liu, Benjamin Eckart, Arash Vahdat
To leverage the compositionality of large language models (LLMs), we introduce a new in-context learning approach to generate blob representations from text prompts.
no code implementations • 8 Jan 2024 • Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat
To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization.
1 code implementation • 4 Dec 2023 • Ali Hatamizadeh, Jiaming Song, Guilin Liu, Jan Kautz, Arash Vahdat
In this paper, we study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT).
Ranked #31 on
Image Generation
on ImageNet 256x256
no code implementations • 6 Oct 2023 • Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi Hanson, Thomas E Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin Aji, Angela Dalton, Michael Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences.
no code implementations • 24 Sep 2023 • Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, Mike Pritchard
The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
1 code implementation • 15 Jun 2023 • Hongkai Zheng, Weili Nie, Arash Vahdat, Anima Anandkumar
For masked training, we introduce an asymmetric encoder-decoder architecture consisting of a transformer encoder that operates only on unmasked patches and a lightweight transformer decoder on full patches.
1 code implementation • 7 May 2023 • Morteza Mardani, Jiaming Song, Jan Kautz, Arash Vahdat
To cope with this challenge, we propose a variational approach that by design seeks to approximate the true posterior distribution.
1 code implementation • CVPR 2023 • Paul Micaelli, Arash Vahdat, Hongxu Yin, Jan Kautz, Pavlo Molchanov
Our Landmark DEQ (LDEQ) achieves state-of-the-art performance on the challenging WFLW facial landmark dataset, reaching $3. 92$ NME with fewer parameters and a training memory cost of $\mathcal{O}(1)$ in the number of recurrent modules.
Ranked #2 on
Face Alignment
on WFLW
1 code implementation • CVPR 2023 • Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello
Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.
Ranked #2 on
Open-World Instance Segmentation
on UVO
(using extra training data)
Open Vocabulary Panoptic Segmentation
Open Vocabulary Semantic Segmentation
+4
no code implementations • 14 Feb 2023 • Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, Christopher Pal, Arash Vahdat, Anima Anandkumar
They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising.
1 code implementation • 12 Feb 2023 • Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, Anima Anandkumar
We propose Image-to-Image Schr\"odinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions.
no code implementations • ICCV 2023 • Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, Jan Kautz
Specifically, we propose a physics-based motion projection module that uses motion imitation in a physics simulator to project the denoised motion of a diffusion step to a physically-plausible motion.
1 code implementation • 24 Nov 2022 • Hongkai Zheng, Weili Nie, Arash Vahdat, Kamyar Azizzadenesheli, Anima Anandkumar
Diffusion models have found widespread adoption in various areas.
2 code implementations • 2 Nov 2022 • Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu
Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages.
Ranked #14 on
Text-to-Image Generation
on MS COCO
1 code implementation • 18 Oct 2022 • Tim Dockhorn, Tianshi Cao, Arash Vahdat, Karsten Kreis
While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains.
2 code implementations • 12 Oct 2022 • Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis
To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes.
Ranked #1 on
Point Cloud Generation
on ShapeNet Airplane
1 code implementation • 11 Oct 2022 • Tim Dockhorn, Arash Vahdat, Karsten Kreis
Synthesis amounts to solving a differential equation (DE) defined by the learnt model.
Ranked #5 on
Image Generation
on AFHQV2
1 code implementation • 30 Sep 2022 • Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III, Anima Anandkumar
The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life.
2 code implementations • 16 May 2022 • Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, Anima Anandkumar
Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
5 code implementations • ICLR 2022 • Zhisheng Xiao, Karsten Kreis, Arash Vahdat
To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively.
Ranked #9 on
Image Generation
on CelebA-HQ 256x256
1 code implementation • ICLR 2022 • Tim Dockhorn, Arash Vahdat, Karsten Kreis
SGMs rely on a diffusion process that gradually perturbs the data towards a tractable distribution, while the generative model learns to denoise.
1 code implementation • CVPR 2022 • Hongxu Yin, Arash Vahdat, Jose Alvarez, Arun Mallya, Jan Kautz, Pavlo Molchanov
A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
Ranked #34 on
Efficient ViTs
on ImageNet-1K (with DeiT-S)
no code implementations • NeurIPS 2021 • Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis
Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.
1 code implementation • 1 Nov 2021 • Tianshi Cao, Alex Bie, Arash Vahdat, Sanja Fidler, Karsten Kreis
Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead.
1 code implementation • NeurIPS 2021 • Weili Nie, Arash Vahdat, Anima Anandkumar
In compositional generation, our method excels at zero-shot generation of unseen attribute combinations.
no code implementations • 29 Sep 2021 • Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo Fusi, Arash Vahdat
In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization approach.
no code implementations • 12 Jul 2021 • Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo Fusi, Arash Vahdat
We analyze three popular network architectures: EfficientNetV1, EfficientNetV2 and ResNeST, and achieve accuracy improvement for all models (up to $3. 0\%$) when compressing larger models to the latency level of smaller models.
1 code implementation • NeurIPS 2021 • Arash Vahdat, Karsten Kreis, Jan Kautz
Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space, resulting in fewer network evaluations and faster sampling.
Ranked #4 on
Image Generation
on CelebA 256x256
no code implementations • 7 Jun 2021 • Sina Mohseni, Arash Vahdat, Jay Yadawa
In this paper, we propose a simple framework that leverages a shifting transformation learning setting for learning multiple shifted representations of the training set for improved OOD detection.
Ranked #9 on
Anomaly Detection
on Unlabeled CIFAR-10 vs CIFAR-100
1 code implementation • CVPR 2021 • Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov
In this work, we introduce GradInversion, using which input images from a larger batch (8 - 48 images) can also be recovered for large networks such as ResNets (50 layers), on complex datasets such as ImageNet (1000 classes, 224x224 px).
no code implementations • NeurIPS 2021 • Jyoti Aneja, Alexander Schwing, Jan Kautz, Arash Vahdat
To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
Ranked #6 on
Image Generation
on CelebA 256x256
(FID metric)
1 code implementation • ICLR 2021 • Zhisheng Xiao, Karsten Kreis, Jan Kautz, Arash Vahdat
VAEBM captures the overall mode structure of the data distribution using a state-of-the-art VAE and it relies on its EBM component to explicitly exclude non-data-like regions from the model and refine the image samples.
Ranked #1 on
Image Generation
on Stacked MNIST
no code implementations • 28 Sep 2020 • Jyoti Aneja, Alex Schwing, Jan Kautz, Arash Vahdat
To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior.
10 code implementations • NeurIPS 2020 • Arash Vahdat, Jan Kautz
For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2. 98 to 2. 91 bits per dimension, and it produces high-quality images on CelebA HQ.
Ranked #3 on
Image Generation
on FFHQ 256 x 256
(bits/dimension metric)
1 code implementation • ECCV 2020 • Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem
Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.
2 code implementations • NeurIPS 2020 • Jeremy Bernstein, Arash Vahdat, Yisong Yue, Ming-Yu Liu
This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions.
1 code implementation • CVPR 2020 • Arash Vahdat, Arun Mallya, Ming-Yu Liu, Jan Kautz
Our framework brings the best of both worlds, and it enables us to search for architectures with both differentiable and non-differentiable criteria in one unified framework while maintaining a low search cost.
1 code implementation • ICCV 2019 • Mehran Khodabandeh, Arash Vahdat, Mani Ranjbar, William G. Macready
To adapt to the domain shift, the model is trained on the target domain using a set of noisy object bounding boxes that are obtained by a detection model trained only in the source domain.
3 code implementations • ICML 2020 • Arash Vahdat, Evgeny Andriyash, William G. Macready
We extend the class of posterior models that may be learned by using undirected graphical models.
no code implementations • CVPR 2020 • Mostafa S. Ibrahim, Arash Vahdat, Mani Ranjbar, William G. Macready
Building a large image dataset with high-quality object masks for semantic segmentation is costly and time consuming.
no code implementations • 29 Sep 2018 • Evgeny Andriyash, Arash Vahdat, Bill Macready
In many applications we seek to maximize an expectation with respect to a distribution over discrete variables.
no code implementations • 27 Sep 2018 • Evgeny Andriyash, Arash Vahdat, Bill Macready
In many applications we seek to optimize an expectation with respect to a distribution over discrete variables.
no code implementations • NeurIPS 2018 • Arash Vahdat, Evgeny Andriyash, William G. Macready
Experiments on the MNIST and OMNIGLOT datasets show that these relaxations outperform previous discrete VAEs with Boltzmann priors.
no code implementations • ICML 2018 • Arash Vahdat, William G. Macready, Zhengbing Bian, Amir Khoshaman, Evgeny Andriyash
Training of discrete latent variable models remains challenging because passing gradient information through discrete units is difficult.
1 code implementation • NeurIPS 2017 • Arash Vahdat
Collecting large training datasets, annotated with high-quality labels, is costly and time-consuming.
1 code implementation • 9 Jul 2016 • Mostafa S. Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori
In order to model both person-level and group-level dynamics, we present a 2-stage deep temporal model for the group activity recognition problem.
1 code implementation • CVPR 2016 • Moustafa Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori
In group activity recognition, the temporal dynamics of the whole activity can be inferred based on the dynamics of the individual people representing the activity.
no code implementations • CVPR 2016 • Zhiwei Deng, Arash Vahdat, Hexiang Hu, Greg Mori
As a concrete example, group activity recognition involves the interactions and relative spatial relations of a set of people in a scene.
Ranked #6 on
Group Activity Recognition
on Collective Activity
no code implementations • 12 Feb 2015 • Mehran Khodabandeh, Arash Vahdat, Guang-Tong Zhou, Hossein Hajimirsadeghi, Mehrsan Javan Roshtkhari, Greg Mori, Stephen Se
We present a novel approach for discovering human interactions in videos.
no code implementations • CVPR 2015 • Hossein Hajimirsadeghi, Wang Yan, Arash Vahdat, Greg Mori
Many visual recognition problems can be approached by counting instances.
no code implementations • NeurIPS 2013 • Guang-Tong Zhou, Tian Lan, Arash Vahdat, Greg Mori
We present a maximum margin framework that clusters data using latent variables.
no code implementations • NeurIPS 2012 • Weilong Yang, Yang Wang, Arash Vahdat, Greg Mori
Latent SVMs (LSVMs) are a class of powerful tools that have been successfully applied to many applications in computer vision.