TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools

mims-harvard/TxAgent 14 Mar 2025

It selects tools based on task objectives and executes structured function calls to solve therapeutic tasks that require clinical reasoning and cross-source validation.

AI Agent Decision Making

225
2.73 stars / hour

Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution

prs-eth/thera 29 Nov 2023

We present a novel way to design neural fields such that points can be queried with an adaptive Gaussian PSF, so as to guarantee correct anti-aliasing at any desired output resolution.

Image Super-Resolution

397
2.64 stars / hour

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

xiaomi-research/r1-aqa 14 Mar 2025

Recently, reinforcement learning (RL) has been shown to greatly enhance the reasoning capabilities of large language models (LLMs), and RL-based approaches have been progressively applied to visual multimodal tasks.

Audio Question Answering Question Answering +1

145
2.21 stars / hour

Agent S: An Open Agentic Framework that Uses Computers Like a Human

simular-ai/agent-s 10 Oct 2024

We present Agent S, an open agentic framework that enables autonomous interaction with computers through a Graphical User Interface (GUI), aimed at transforming human-computer interaction by automating complex, multi-step tasks.

AI Agent Task Planning

1,276
1.85 stars / hour

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

sparkaudio/spark-tts 3 Mar 2025

Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.

Attribute Text to Speech +1

4,992
1.73 stars / hour

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

kuleshov-group/bd3lms 12 Mar 2025

Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation.

Denoising Language Modeling +1

340
1.51 stars / hour

YOLOE: Real-Time Seeing Anything

THU-MIG/yoloe 10 Mar 2025

Object detection and segmentation are widely employed in computer vision applications, yet conventional models like YOLO series, while efficient and accurate, are limited by predefined categories, hindering adaptability in open scenarios.

10-shot image generation

750
1.50 stars / hour

FoundationStereo: Zero-Shot Stereo Matching

NVlabs/FoundationStereo 17 Jan 2025

However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching.

Diversity Stereo Depth Estimation +2

981
1.38 stars / hour

ReasonGraph: Visualisation of Reasoning Paths

ZongqianLi/ReasonGraph 6 Mar 2025

Large Language Models (LLMs) reasoning processes are challenging to analyze due to their complexity and the lack of organized visualization tools.

293
1.31 stars / hour

Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k

hpcaitech/open-sora 12 Mar 2025

With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable.

Video Generation

25,514
1.12 stars / hour