SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

BlinkDL/RWKV-LM 18 Nov 2022

We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.


Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

picsart-ai-research/text2video-zero 23 Mar 2023

Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets.

Image Generation Text-to-Video Generation +3

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

lukashoel/text2room 21 Mar 2023

We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.

Monocular Depth Estimation

Zero-1-to-3: Zero-shot One Image to 3D Object

cvlab-columbia/zero123 20 Mar 2023

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.

3D Reconstruction Novel View Synthesis +1

LoRA: Low-Rank Adaptation of Large Language Models

microsoft/LoRA ICLR 2022

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Language Modelling

ReVersion: Diffusion-Based Relation Inversion from Images

ziqihuangg/reversion 23 Mar 2023

Specifically, we propose a novel relation-steering contrastive learning scheme to impose two critical properties of the relation prompt: 1) The relation prompt should capture the interaction between objects, enforced by the preposition prior.

Contrastive Learning

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

dvlab-research/VoxelNeXt 20 Mar 2023

Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.

3D Object Detection object-detection

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

winfredy/sadtalker 22 Nov 2022

We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Talking Head Generation

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

megvii-research/CVPR2023-DMVFN 17 Mar 2023

The performance of video prediction has been greatly boosted by advanced deep neural networks.

Video Prediction

