In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date.
We present neural fixed-point acceleration, a framework to automatically learn to accelerate convex fixed-point problems that are drawn from a distribution, using ideas from meta-learning and classical acceleration algorithms.
Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3. 94 on ImageNet 256$\times$256 and 3. 85 on ImageNet 512$\times$512.
Ranked #1 on Image Generation on ImageNet 64x64 (FID metric)
Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations.
We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene's 3D nature, and is learned without supervision.
Blind face restoration usually relies on facial priors, such as facial geometry prior or reference prior, to restore realistic and faithful details.
Ranked #1 on Blind Face Restoration on CelebA-Test
Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results.
Ranked #1 on Panoptic Segmentation on COCO minival
In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision.
Ranked #1 on Visual Question Answering on MSVD-QA (using extra training data)