To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution.
Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years.
Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e. g.$, as few as 5, 000 images to train from scratch.
Although text-to-image diffusion models have made significant strides in generating images from text, they are sometimes more inclined to generate images like the data on which the model was trained rather than the provided text.
However, it is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
In this paper, we introduce classification and regression diffusion (CARD) models, which combine a denoising diffusion-based conditional generative model and a pre-trained conditional mean estimator, to accurately predict the distribution of $\boldsymbol y$ given $\boldsymbol x$.
Both the observed and generated data are diffused by the same adaptive diffusion process.
Ranked #1 on Image Generation on LSUN Bedroom 256 x 256
This paper introduces a new topic-modeling framework where each document is viewed as a set of word embedding vectors and each topic is modeled as an embedding vector in the same embedding space.
For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain.
Ranked #1 on Text-to-Image Generation on CUB
In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting.
The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains.
Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space.
For training more effective agents, we propose a framework that supports learning a flexible and well-regularized policy, which consists of a fully implicit policy and a regularization through the state-action visitation frequency induced by the current policy and that induced by the data-collecting behavior policy.
Crossformer with states sharing not only provides the desired cross-layer guidance and regularization but also reduces the memory requirement.
We realize this strategy with contrastive attraction and contrastive repulsion (CACR), which makes the query not only exert a greater force to attract more distant positive samples but also do so to repel closer negative samples.
The forward CT is the expected cost of moving a source data point to a target one, with their joint distribution defined by the product of the source probability density function (PDF) and a source-dependent conditional distribution, which is related to the target PDF via Bayes' theorem.
Based on this object function we introduce a novel information theoretic framework for unsupervised image anomaly detection.
Ranked #8 on Anomaly Detection on One-class CIFAR-100
Leveraging well-established MCMC strategies, we propose MCMC-interactive variational inference (MIVI) to not only estimate the posterior in a time constrained manner, but also facilitate the design of MCMC transitions.
NANG learns a unifying latent representation which is shared by both node attributes and graph structures and can be translated to different modalities.
The key observation is that, although the object is a 3D volume, what we really need in segmentation is to find its boundary which is a 2D surface.
However, in medical image analysis, fusing prediction from two phases is often difficult, because (i) there is a domain gap between two phases, and (ii) the semantic labels are not pixel-wise corresponded even for images scanned from the same patient.
In information theory, Fisher information and Shannon information (entropy) are respectively used to quantify the uncertainty associated with the distribution modeling and the uncertainty in specifying the outcome of given variables.
While enormous progress has been made to Variational Autoencoder (VAE) in recent years, similar to other deep networks, VAE with deep networks suffers from the problem of degeneration, which seriously weakens the correlation between the input and the corresponding latent codes, deviating from the goal of the representation learning.