Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

shi-labs/versatile-diffusion 15 Nov 2022

Through our experiments, we demonstrate that VD and its underlying framework have the following merits: a) VD handles all subtasks with competitive quality; b) VD initiates novel extensions and applications such as disentanglement of style and semantic, image-text dual-guided generation, etc.

Disentanglement Image Captioning +4

ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

ubisoft/ubisoft-laforge-ZeroEGGS 15 Sep 2022

In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles.

Gesture Generation

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

yuqinie98/patchtst 27 Nov 2022

Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models.

Multivariate Time Series Forecasting Representation Learning

Conffusion: Confidence Intervals for Diffusion Models

eliahuhorwitz/conffusion 17 Nov 2022

Diffusion models have become the go-to method for many generative tasks, particularly for image-to-image generation tasks such as super-resolution and inpainting.

Conformal Prediction Facial Inpainting +2

H3WB: Human3.6M 3D WholeBody Dataset and Benchmark

wholebody3d/wholebody3d 28 Nov 2022

We also report several baselines from popular methods for these tasks.

Pose Estimation

DiffusionDet: Diffusion Model for Object Detection

shoufachen/diffusiondet 17 Nov 2022

In inference, the model refines a set of randomly generated boxes to the output results in a progressive way.

Denoising object-detection +1

Language-driven Open-Vocabulary 3D Scene Understanding

cvmi-lab/pla 29 Nov 2022

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space.

Contrastive Learning Instance Segmentation +3

Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces

dome272/paella 14 Nov 2022

Conditional text-to-image generation has seen countless recent improvements in terms of quality, diversity and fidelity.

Conditional Image Generation Denoising +2

MetaFormer Baselines for Vision

facebookresearch/xformers 24 Oct 2022

By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.

Image Classification

VeLO: Training Versatile Learned Optimizers by Scaling Up

google/learned_optimization 17 Nov 2022

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers.

