We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.
Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets.
We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.
Specifically, we propose a novel relation-steering contrastive learning scheme to impose two critical properties of the relation prompt: 1) The relation prompt should capture the interaction between objects, enforced by the preposition prior.
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
Ranked #1 on
Question Answering
on PIQA
To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.
In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes.
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.
Ranked #4 on
Video Generation
on UCF-101