We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.
Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets.
To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.
We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
In this work, we investigate the problem of creating high-fidelity 3D content from only a single image.
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.
In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes.
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
Ranked #1 on Question Answering on PIQA
Specifically, we propose a novel relation-steering contrastive learning scheme to impose two critical properties of the relation prompt: 1) The relation prompt should capture the interaction between objects, enforced by the preposition prior.
To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action.