Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving

bravegroup/drive-wm 29 Nov 2023

In autonomous driving, predicting future events in advance and evaluating the foreseeable risks empowers autonomous vehicles to better plan their actions, enhancing safety and efficiency on the road.

Autonomous Driving

58
1.04 stars / hour

Adversarial Diffusion Distillation

stability-ai/generative-models 28 Nov 2023

We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps while maintaining high image quality.

Image Generation

15,849
0.95 stars / hour

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

yule-buaa/mergelm 6 Nov 2023

Based on this observation, we further sparsify delta parameters of multiple SFT homologous models with DARE and subsequently merge them into a single model by parameter averaging.

GSM8K Instruction Following

354
0.94 stars / hour

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

envision-research/luciddreamer 19 Nov 2023

The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios.

Text to 3D

388
0.90 stars / hour

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation

king159/pair-net 17 Jul 2023

Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.

Graph Generation Panoptic Scene Graph Generation +1

85
0.84 stars / hour

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

ZiqiaoPeng/SyncTalk 29 Nov 2023

A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses.

87
0.83 stars / hour

Sketch Video Synthesis

yudianzheng/sketchvideo 26 Nov 2023

Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos.

Video Editing

65
0.83 stars / hour

White-Box Transformers via Sparse Rate Reduction

ma-lab-berkeley/crate NeurIPS 2023

Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens.

867
0.83 stars / hour

LoRA: Low-Rank Adaptation of Large Language Models

longyuewangdcu/chinese-llama-2 ICLR 2022

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Language Modelling

412
0.81 stars / hour

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

PKU-YuanGroup/Video-LLaVA 27 Nov 2023

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.

Decision Making Question Answering

1,431
0.77 stars / hour