FoundationStereo: Zero-Shot Stereo Matching

NVlabs/FoundationStereo 17 Jan 2025

However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching.

Diversity Stereo Depth Estimation +2

250
1.18 stars / hour

X-Dyna: Expressive Dynamic Human Image Animation

bytedance/x-dyna 17 Jan 2025

At the core of our approach is the Dynamics-Adapter, a lightweight module that effectively integrates reference appearance context into the spatial attentions of the diffusion backbone while preserving the capacity of motion modules in synthesizing fluid and intricate dynamic details.

Image Animation

106
1.10 stars / hour

Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization

Alibaba-NLP/CHRONOS 1 Jan 2025

In the fast-changing realm of information, the capacity to construct coherent timelines from extensive event-related content has become increasingly significant and challenging.

News Retrieval Retrieval +1

171
0.99 stars / hour

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

pku-alignment/align-anything 20 Dec 2024

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

Instruction Following

819
0.93 stars / hour

HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

caiyuanhao1998/hdr-gs 24 May 2024

In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time.

Novel View Synthesis

301
0.92 stars / hour

CameraHMR: Aligning People with Perspective

pixelite1201/CameraHMR 12 Nov 2024

We use the estimated intrinsics to enhance the 4D-Humans dataset by incorporating a full perspective camera model during SMPLify fitting.

3D human pose and shape estimation

86
0.86 stars / hour

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

asinghcsu/agenticrag-survey 15 Jan 2025

Large Language Models (LLMs) have revolutionized artificial intelligence (AI) by enabling human like text generation and natural language understanding.

Natural Language Understanding RAG +3

155
0.82 stars / hour

WebWalker: Benchmarking LLMs in Web Traversal

alibaba-nlp/webwalker 13 Jan 2025

Extensive experimental results show that WebWalkerQA is challenging and demonstrates the effectiveness of RAG combined with WebWalker, through the horizontal and vertical integration in real-world scenarios.

Benchmarking Open-Domain Question Answering +2

247
0.79 stars / hour

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

deepseek-ai/deepseek-coder-v2 17 Jun 2024

Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

16k Language Modeling +3

3,075
0.74 stars / hour

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

fudan-generative-vision/hallo3 1 Dec 2024

Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds.

Image Animation Portrait Animation

810
0.72 stars / hour