VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Denoising Image Generation +1

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient.

Language Modelling Model Compression +1

Planning-oriented Autonomous Driving

Oriented at this, we revisit the key components within perception and prediction, and prioritize the tasks such that all these tasks contribute to planning.

Autonomous Driving Philosophy

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

As a pioneering work, PointContrast conducts unsupervised 3D representation learning via leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on various downstream tasks.

Contrastive Learning Data Augmentation +3

Ablating Concepts in Text-to-Image Diffusion Models

To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i. e., preventing the generation of a target concept.

MaskSketch: Unpaired Structure-guided Masked Image Generation

We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image, such as scene layout and object shape, and we propose a novel sampling method based on this observation to enable structure-guided generation.

Conditional Image Generation Image-to-Image Translation +2

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

To democratize this, we train and release a family of large language models up to 16. 1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.

Code Generation Language Modelling +1

Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild

Hence, interacting hands of MoCap datasets are brought to the 2D scale space of single hands of ITW datasets.

High Fidelity Image Synthesis With Deep VAEs In Latent Space

With this method, the VAE avoids modeling the fine-grained details that constitute the majority of the image's code length, allowing it to focus on learning its structural components.

Image Generation

Robust Speech Recognition via Large-Scale Weak Supervision

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Robust Speech Recognition speech-recognition

