HAC++: Towards 100X Compression of 3D Gaussian Splatting

yihangchen-ee/hac-plus 21 Jan 2025

3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity.

Attribute Novel View Synthesis +1

52
0.70 stars / hour

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment

VITA-Group/VideoLifter 3 Jan 2025

Efficiently reconstructing accurate 3D models from monocular video is a key challenge in computer vision, critical for advancing applications in virtual reality, robotics, and scene understanding.

Computational Efficiency Scene Understanding

81
0.70 stars / hour

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

bytedance/agent-r 20 Jan 2025

To further explore the scalability of this self-improvement paradigm, we investigate iterative refinement of both error correction capabilities and dataset construction.

Language Modeling Language Modelling

67
0.68 stars / hour

GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing

mbzuai-oryx/geopixel 23 Jan 2025

GeoPixel supports up to 4K HD resolution in any aspect ratio, ideal for high-precision RS image analysis.

4k

27
0.67 stars / hour

UnCommon Objects in 3D

facebookresearch/uco3d 13 Jan 2025

We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI.

Object

375
0.63 stars / hour

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

deepseek-ai/DeepSeek-Coder 25 Jan 2024

The rapid development of large language models has revolutionized code intelligence in software development.

Code Generation Language Modeling +2

11,339
0.63 stars / hour

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

volcengine/verl 28 Oct 2024

Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains.

Diversity Math

1,004
0.63 stars / hour

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

deepseek-ai/deepseek-vl2 13 Dec 2024

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades.

Chart Understanding Optical Character Recognition +4

984
0.62 stars / hour

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

magic-research/Sa2VA 7 Jan 2025

This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos.

2k Language Modeling +7

796
0.61 stars / hour

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

VARGPT-family/VARGPT 21 Jan 2025

We present VARGPT, a novel multimodal large language model (MLLM) that unifies visual understanding and generation within a single autoregressive framework.

Image Generation Instruction Following +6

55
0.61 stars / hour