3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity.
Efficiently reconstructing accurate 3D models from monocular video is a key challenge in computer vision, critical for advancing applications in virtual reality, robotics, and scene understanding.
To further explore the scalability of this self-improvement paradigm, we investigate iterative refinement of both error correction capabilities and dataset construction.
GeoPixel supports up to 4K HD resolution in any aspect ratio, ideal for high-precision RS image analysis.
We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI.
The rapid development of large language models has revolutionized code intelligence in software development.
Ranked #3 on Code Generation on APPS
Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains.
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades.
Ranked #1 on Referring Expression Comprehension on RefCOCOg-test
This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos.
We present VARGPT, a novel multimodal large language model (MLLM) that unifies visual understanding and generation within a single autoregressive framework.