Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

magic-research/Sa2VA 7 Jan 2025

This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos.

2k Language Modeling +7

585
1.22 stars / hour

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks

fudan-generative-vision/hallo3 1 Dec 2024

Existing methodologies for animating portrait images face significant challenges, particularly in handling non-frontal perspectives, rendering dynamic objects around the portrait, and generating immersive, realistic backgrounds.

Image Animation Portrait Animation

533
1.18 stars / hour

Search-o1: Agentic Search-Enhanced Large Reasoning Models

sunnynexus/search-o1 9 Jan 2025

To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents.

Code Generation +4

378
1.14 stars / hour

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

icip-cas/pptagent 7 Jan 2025

Automatically generating presentations from documents is a challenging task that requires balancing content quality, visual design, and structural coherence.

256
1.06 stars / hour

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

bytedance/LatentSync 12 Dec 2024

Since we did not change the overall training framework of SyncNet, our experience can also be applied to other lip sync and audio-driven portrait animation methods that utilize SyncNet.

Portrait Animation

1,872
1.03 stars / hour

3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud or Mesh

lewis-stuart-11/3dgs-to-pc 13 Jan 2025

The result is a point cloud that closely represents the shape encoded into the 3D Gaussian scene.

Surface Reconstruction

162
0.86 stars / hour

OASIS: Open Agent Social Interaction Simulations with One Million Agents

camel-ai/oasis 18 Nov 2024

There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i. e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems.

Large Language Model Recommendation Systems

478
0.86 stars / hour

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

automl/tabpfn 5 Jul 2022

We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods.

AutoML Bayesian Inference +5

2,105
0.81 stars / hour

Lifelong Learning of Large Language Model based Agents: A Roadmap

qianlima-lab/awesome-lifelong-llm-agent 13 Jan 2025

This survey is the first to systematically summarize the potential techniques for incorporating lifelong learning into LLM-based agents.

Incremental Learning Language Modeling +3

75
0.81 stars / hour

Cosmos World Foundation Model Platform for Physical AI

nvidia/cosmos 7 Jan 2025

We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.

Position

6,983
0.78 stars / hour