Generalizable and Animatable Gaussian Head Avatar

xg-chu/gagavatar 10 Oct 2024

In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction.

117
0.80 stars / hour

Fast Feedforward 3D Gaussian Splatting Compression

yihangchen-ee/fcgs 10 Oct 2024

With 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption.

Novel View Synthesis

56
0.71 stars / hour

Libra: Building Decoupled Vision System on Large Language Models

yifanxu74/libra 16 May 2024

Specifically, we incorporate a routed visual expert with a cross-modal bridge module into a pretrained LLM to route the vision and language flows during attention computing to enable different attention patterns in inner-modal modeling and cross-modal interaction scenarios.

Language Modelling Large Language Model

117
0.65 stars / hour

AgentKit: Structured LLM Reasoning with Dynamic Graphs

holmeswww/agentkit 17 Apr 2024

The chains of nodes can be designed to explicitly enforce a naturally structured "thought process".

384
0.64 stars / hour

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

yh-hust/pdf-wukong 8 Oct 2024

In this paper, we introduce PDF-WuKong, a multimodal large language model (MLLM) which is designed to enhance multimodal question-answering (QA) for long PDF documents.

document understanding Language Modelling +3

71
0.56 stars / hour

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

g-u-n/rectified-diffusion 9 Oct 2024

Building on this insight, we propose Rectified Diffusion, which generalizes the design space and application scope of rectification to encompass the broader category of diffusion models, rather than being restricted to flow-matching models.

69
0.55 stars / hour

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

microsoft/vptq 25 Sep 2024

Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down to 2 bits).

Quantization

387
0.53 stars / hour

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

shikiw/modality-integration-rate 9 Oct 2024

We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs).

cross-modal alignment

55
0.52 stars / hour

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

yangling0818/itercomp 9 Oct 2024

IterComp opens new research avenues in reward feedback learning for diffusion models and compositional generation.

Attribute Text-to-Image Generation

64
0.52 stars / hour

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

microsoft/windowsagentarena 12 Sep 2024

To demonstrate Windows Agent Arena's capabilities, we also introduce a new multi-modal agent, Navi.

311
0.50 stars / hour