UI-TARS: Pioneering Automated GUI Interaction with Native Agents

bytedance/ui-tars 21 Jan 2025

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e. g., keyboard and mouse operations).

4,872
0.82 stars / hour

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

sakanaai/ai-scientist-v2 10 Apr 2025

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made.

scientific discovery

909
0.79 stars / hour

LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion

robfiras/loco-mujoco 4 Nov 2023

Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied agents.

Benchmarking Imitation Learning

873
0.73 stars / hour

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

foundationagents/awesome-foundation-agents 31 Mar 2025

The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable of sophisticated reasoning, robust perception, and versatile action across diverse domains.

 Ranked #1 on Continual Learning on AIDS (using extra training data)

AutoML Continual Learning

964
0.72 stars / hour

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

End2End-Diffusion/REPA-E 14 Apr 2025

We show that while diffusion loss is ineffective, end-to-end training can be unlocked through the representation-alignment (REPA) loss -- allowing both VAE and diffusion model to be jointly tuned during the training process.

163
0.72 stars / hour

Event-Enhanced Blurry Video Super-Resolution

dachunkai/ev-deblurvsr 17 Apr 2025

In this paper, we tackle the task of blurry video super-resolution (BVSR), aiming to generate high-resolution (HR) videos from low-resolution (LR) and blurry inputs.

Deblurring Motion Estimation +2

80
0.67 stars / hour

LTX-Video: Realtime Video Latent Diffusion

Lightricks/LTX-Video 30 Dec 2024

To address this, our VAE decoder is tasked with both latent-to-pixel conversion and the final denoising step, producing the clean result directly in pixel space.

Denoising Image to Video Generation

3,521
0.62 stars / hour

Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution

dlmaria/syzygy-of-thoughts 13 Apr 2025

Chain-of-Thought (CoT) prompting enhances the reasoning of large language models (LLMs) by decomposing problems into sequential steps, mimicking human logic and reducing errors.

GSM8K Math

208
0.60 stars / hour

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL

wdrink/simplear 15 Apr 2025

This work presents SimpleAR, a vanilla autoregressive visual generation framework without complex architecure modifications.

Inference Optimization

256
0.58 stars / hour