SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

ZiqiaoPeng/SyncTalk 29 Nov 2023

A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses.

Talking Face Generation Talking Head Generation

159
0.88 stars / hour

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

wzzheng/occworld 27 Nov 2023

In this paper, we explore a new framework of learning a world model, OccWorld, in the 3D Occupancy space to simultaneously predict the movement of the ego car and the evolution of the surrounding scenes.

Autonomous Driving

139
0.83 stars / hour

Initializing Models with Larger Ones

oscarxzq/weight-selection 30 Nov 2023

Weight selection offers a new approach to leverage the power of pretrained models in resource-constrained settings, and we hope it can be a useful tool for training small models in the large-model era.

Knowledge Distillation

61
0.80 stars / hour

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

ailab-cvc/unireplknet 27 Nov 2023

1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep.

 Ranked #1 on Object Detection on COCO 2017 (mAP metric)

Image Classification Object Detection +3

137
0.75 stars / hour

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

dvlab-research/llama-vid 28 Nov 2023

Current VLMs, while proficient in tasks like image captioning and visual question answering, face computational burdens when processing long videos due to the excessive visual tokens.

Image Captioning Video-based Generative Performance Benchmarking +2

110
0.75 stars / hour

Adversarial Diffusion Distillation

stability-ai/generative-models 28 Nov 2023

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality.

Image Generation

16,218
0.66 stars / hour

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

whwu95/GPT4Vis 27 Nov 2023

Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.

Zero-Shot Learning

84
0.62 stars / hour

RETVec: Resilient and Efficient Text Vectorizer

google-research/retvec NeurIPS 2023

The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks.

Adversarial Text Metric Learning +1

73
0.58 stars / hour

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

lizhe00/animatablegaussians 27 Nov 2023

Overall, our method can create lifelike avatars with dynamic, realistic and generalized appearances.

187
0.56 stars / hour

Instruction Tuning with Human Curriculum

imoneoi/openchat 14 Oct 2023

The dominant paradigm for instruction tuning is the random-shuffled training of maximally diverse instruction-response pairs.

3,569
0.56 stars / hour