SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

winfredy/sadtalker 22 Nov 2022

We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Talking Head Generation

567
1.39 stars / hour

Learning Context-aware Classifier for Semantic Segmentation

Pointcept/Pointcept 21 Mar 2023

Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing.

Semantic Segmentation

115
1.34 stars / hour

ReVersion: Diffusion-Based Relation Inversion from Images

ziqihuangg/reversion 23 Mar 2023

Specifically, we propose a novel relation-steering contrastive learning scheme to impose two critical properties of the relation prompt: 1) The relation prompt should capture the interaction between objects, enforced by the preposition prior.

Contrastive Learning

91
1.33 stars / hour

Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws

wassimtenachi/physo 6 Mar 2023

Here we present $\Phi$-SO, a Physical Symbolic Optimization framework for recovering analytical symbolic expressions from physics data using deep reinforcement learning techniques by learning units constraints.

Symbolic Regression

1,212
1.05 stars / hour

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

ist-daslab/sparsegpt 2 Jan 2023

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy.

 Ranked #1 on Language Modelling on WikiText-2 (using extra training data)

Common Sense Reasoning Language Modelling +2

52
1.04 stars / hour

CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition

deeptibhegde/clip-goes-3d 20 Mar 2023

Attempting to train the visual and text encoder to account for this shift results in catastrophic forgetting and a notable decrease in performance.

Retrieval Scene Understanding

81
0.99 stars / hour

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

chenyangqiqi/fatezero 16 Mar 2023

We also have a better zero-shot shape-aware editing ability based on the text-to-video model.

Video Editing

374
0.98 stars / hour

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

BlinkDL/RWKV-LM 18 Nov 2022

We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.

Quantization

3,452
0.93 stars / hour

GPT-4 Technical Report

openai/evals Preprint 2023

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

6,374
0.90 stars / hour

LoRA: Low-Rank Adaptation of Large Language Models

microsoft/LoRA ICLR 2022

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Language Modelling

1,303
0.86 stars / hour