1 code implementation • 19 Oct 2024 • Mingyuan Zhou, Huangjie Zheng, Yi Gu, Zhendong Wang, Hai Huang
Score identity Distillation (SiD) is a data-free method that has achieved SOTA performance in image generation by leveraging only a pretrained diffusion model, without requiring any training data.
no code implementations • 12 Jun 2024 • Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie
For discriminative models like CLIP, we observe enhanced zero-shot performance in cross-modal retrieval tasks.
Ranked #96 on Visual Question Answering on MM-Vet
2 code implementations • 3 Jun 2024 • Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang
Specifically, its data-free distillation of Stable Diffusion 1. 5 achieves a record low FID of 8. 15 on the COCO-2014 validation set, with a CLIP score of 0. 304 at an LSG scale of 1. 5, and an FID of 9. 56 with a CLIP score of 0. 313 at an LSG scale of 2.
Ranked #22 on Text-to-Image Generation on MS COCO
2 code implementations • 5 Apr 2024 • Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang
This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation.
Ranked #9 on Image Generation on CIFAR-10
no code implementations • 20 Nov 2023 • Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang
To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.
1 code implementation • 10 Oct 2023 • Huangjie Zheng, Zhendong Wang, Jianbo Yuan, Guanghan Ning, Pengcheng He, Quanzeng You, Hongxia Yang, Mingyuan Zhou
Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling.
Ranked #13 on Image Generation on CelebA 64x64
1 code implementation • NeurIPS 2023 • Mingyuan Zhou, Tianqi Chen, Zhendong Wang, Huangjie Zheng
We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges.
1 code implementation • CVPR 2023 • Yiming Qin, Huangjie Zheng, Jiangchao Yao, Mingyuan Zhou, Ya zhang
To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution.
1 code implementation • 29 Apr 2023 • Korawat Tanwisuth, Shujian Zhang, Huangjie Zheng, Pengcheng He, Mingyuan Zhou
Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years.
1 code implementation • NeurIPS 2023 • Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou
Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e. g.$, as few as 5, 000 images to train from scratch.
1 code implementation • 11 Apr 2023 • Mohammadreza Armandpour, Ali Sadeghian, Huangjie Zheng, Amir Sadeghian, Mingyuan Zhou
Although text-to-image diffusion models have made significant strides in generating images from text, they are sometimes more inclined to generate images like the data on which the model was trained rather than the provided text.
1 code implementation • CVPR 2023 • Zhixin Wang, Xiaoyun Zhang, Ziying Zhang, Huangjie Zheng, Mingyuan Zhou, Ya zhang, Yanfeng Wang
However, it is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
2 code implementations • 15 Jun 2022 • Xizewen Han, Huangjie Zheng, Mingyuan Zhou
In this paper, we introduce classification and regression diffusion (CARD) models, which combine a denoising diffusion-based conditional generative model and a pre-trained conditional mean estimator, to accurately predict the distribution of $\boldsymbol y$ given $\boldsymbol x$.
3 code implementations • 5 Jun 2022 • Zhendong Wang, Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou
Both the observed and generated data are diffused by the same adaptive diffusion process.
Ranked #1 on Image Generation on LSUN Bedroom 256 x 256
2 code implementations • ICLR 2022 • Dongsheng Wang, Dandan Guo, He Zhao, Huangjie Zheng, Korawat Tanwisuth, Bo Chen, Mingyuan Zhou
This paper introduces a new topic-modeling framework where each document is viewed as a set of word embedding vectors and each topic is modeled as an embedding vector in the same embedding space.
1 code implementation • 19 Feb 2022 • Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou
Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain.
Ranked #2 on Text-to-Image Generation on CUB
no code implementations • 19 Feb 2022 • Shentao Yang, Zhendong Wang, Huangjie Zheng, Yihao Feng, Mingyuan Zhou
For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
2 code implementations • 14 Feb 2022 • Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou
In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting.
1 code implementation • NeurIPS 2021 • Shujian Zhang, Xinjie Fan, Huangjie Zheng, Korawat Tanwisuth, Mingyuan Zhou
The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains.
1 code implementation • NeurIPS 2021 • Korawat Tanwisuth, Xinjie Fan, Huangjie Zheng, Shujian Zhang, Hao Zhang, Bo Chen, Mingyuan Zhou
Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space.
no code implementations • 29 Sep 2021 • Shujian Zhang, Zhibin Duan, Huangjie Zheng, Pengcheng He, Bo Chen, Weizhu Chen, Mingyuan Zhou
Crossformer with states sharing not only provides the desired cross-layer guidance and regularization but also reduces the memory requirement.
no code implementations • 29 Sep 2021 • Shentao Yang, Zhendong Wang, Huangjie Zheng, Mingyuan Zhou
For training more effective agents, we propose a framework that supports learning a flexible and well-regularized policy, which consists of a fully implicit policy and a regularization through the state-action visitation frequency induced by the current policy and that induced by the data-collecting behavior policy.
1 code implementation • 8 May 2021 • Huangjie Zheng, Xu Chen, Jiangchao Yao, Hongxia Yang, Chunyuan Li, Ya zhang, Hao Zhang, Ivor Tsang, Jingren Zhou, Mingyuan Zhou
We realize this strategy with contrastive attraction and contrastive repulsion (CACR), which makes the query not only exert a greater force to attract more distant positive samples but also do so to repel closer negative samples.
1 code implementation • NeurIPS 2021 • Huangjie Zheng, Mingyuan Zhou
The forward CT is the expected cost of moving a source data point to a target one, with their joint distribution defined by the product of the source probability density function (PDF) and a source-dependent conditional distribution, which is related to the target PDF via Bayes' theorem.
no code implementations • 9 Dec 2020 • Fei Ye, Huangjie Zheng, Chaoqin Huang, Ya zhang
Based on this object function we introduce a novel information theoretic framework for unsupervised image anomaly detection.
Ranked #9 on Anomaly Detection on One-class CIFAR-100
3 code implementations • 3 Nov 2020 • Xu Chen, Siheng Chen, Jiangchao Yao, Huangjie Zheng, Ya zhang, Ivor W Tsang
Thereby, designing a new GNN for these graphs is a burning issue to the graph learning community.
no code implementations • 2 Oct 2020 • Quan Zhang, Huangjie Zheng, Mingyuan Zhou
Leveraging well-established MCMC strategies, we propose MCMC-interactive variational inference (MIVI) to not only estimate the posterior in a time constrained manner, but also facilitate the design of MCMC transitions.
no code implementations • 28 Sep 2020 • Huangjie Zheng, Mingyuan Zhou
We propose conditional transport (CT) as a new divergence to measure the difference between two probability distributions.
3 code implementations • 23 Jul 2019 • Xu Chen, Siheng Chen, Huangjie Zheng, Jiangchao Yao, Kenan Cui, Ya zhang, Ivor W. Tsang
NANG learns a unifying latent representation which is shared by both node attributes and graph structures and can be translated to different modalities.
2 code implementations • CVPR 2019 • Tianwei Ni, Lingxi Xie, Huangjie Zheng, Elliot K. Fishman, Alan L. Yuille
The key observation is that, although the object is a 3D volume, what we really need in segmentation is to find its boundary which is a 2D surface.
no code implementations • 28 Nov 2018 • Huangjie Zheng, Lingxi Xie, Tianwei Ni, Ya zhang, Yan-Feng Wang, Qi Tian, Elliot K. Fishman, Alan L. Yuille
However, in medical image analysis, fusing prediction from two phases is often difficult, because (i) there is a domain gap between two phases, and (ii) the semantic labels are not pixel-wise corresponded even for images scanned from the same patient.
no code implementations • 10 Jul 2018 • Huangjie Zheng, Jiangchao Yao, Ya zhang, Ivor W. Tsang, Jia Wang
In information theory, Fisher information and Shannon information (entropy) are respectively used to quantify the uncertainty associated with the distribution modeling and the uncertainty in specifying the outcome of given variables.
no code implementations • 19 Feb 2018 • Huangjie Zheng, Jiangchao Yao, Ya zhang, Ivor W. Tsang
While enormous progress has been made to Variational Autoencoder (VAE) in recent years, similar to other deep networks, VAE with deep networks suffers from the problem of degeneration, which seriously weakens the correlation between the input and the corresponding latent codes, deviating from the goal of the representation learning.