Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

tencent/hunyuan3d-2 21 Jan 2025

This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint.

Texture Synthesis

3,250
11.23 stars / hour

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

bytedance/ui-tars 21 Jan 2025

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e. g., keyboard and mouse operations).

1,336
7.56 stars / hour

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

vgenai-netflix-eyeline-research/go-with-the-flow 14 Jan 2025

The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer.

Optical Flow Estimation

495
2.52 stars / hour

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

damo-nlp-sg/videollama3 22 Jan 2025

The key insight of our vision-centric training paradigm is that high-quality image-text data is crucial for both image and video understanding.

Philosophy Video Understanding

187
2.47 stars / hour

DeepSeek-V3 Technical Report

deepseek-ai/deepseek-v3 27 Dec 2024

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

Language Modeling Language Modelling

25,744
2.35 stars / hour

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

deepseek-ai/deepseek-llm 5 Jan 2024

The rapid development of open-source large language models (LLMs) has been truly remarkable.

2,408
2.24 stars / hour

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

plurai-ai/intellagent 19 Jan 2025

IntellAgent represents a paradigm shift in evaluating conversational AI.

Navigate

202
2.19 stars / hour

PaSa: An LLM Agent for Comprehensive Academic Paper Search

bytedance/pasa 17 Jan 2025

Notably, PaSa-7B surpasses the best Google-based baseline, Google with GPT-4o, by 37. 78% in recall@20 and 39. 90% in recall@50.

409
2.08 stars / hour

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

ziyuguo99/image-generation-cot 23 Jan 2025

We hope our study provides unique insights and paves a new path for integrating CoT reasoning with autoregressive image generation.

Image Generation

105
1.37 stars / hour

DiffuEraser: A Diffusion Model for Video Inpainting

lixiaowen-xw/diffueraser 17 Jan 2025

Recent video inpainting algorithms integrate flow-based pixel propagation with transformer-based generation to leverage optical flow for restoring textures and objects using information from neighboring frames, while completing masked regions through visual Transformers.

model Optical Flow Estimation +2

148
1.18 stars / hour