UFO2: The Desktop AgentOS

microsoft/UFO 20 Apr 2025

Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows through natural language.

7,190
0.25 stars / hour

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

index-tts/index-tts 8 Feb 2025

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities. Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model.

Decoder Language Modeling +5

1,663
0.24 stars / hour

MonSter: Marry Monodepth to Stereo Unleashes Power

junda24/monster 15 Jan 2025

The refined monodepth is in turn guides stereo effectively at ill-posed regions.

Monocular Depth Estimation Stereo Matching +1

525
0.24 stars / hour

MMHCL: Multi-Modal Hypergraph Contrastive Learning for Recommendation

xu107/mmhcl 23 Apr 2025

For a comprehensive information exploration from user-product relations, we construct two hypergraphs, i. e. a user-to-user (u2u) hypergraph and an item-to-item (i2i) hypergraph, to mine shared preferences among users and intricate multimodal semantic resemblance among items, respectively.

Contrastive Learning Hypergraph Contrastive Learning +1

18
0.24 stars / hour

Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews

bandas-center/atrain 18 Oct 2023

If an entry-level graphics card is available, the transcription speed increases to 20% of the audio duration.

Speaker Recognition

548
0.23 stars / hour

Deep Industrial Image Anomaly Detection: A Survey

m-3lab/awesome-industrial-anomaly-detection 27 Jan 2023

In this paper, we provide a comprehensive review of deep learning-based image anomaly detection techniques, from the perspectives of neural network architectures, levels of supervision, loss functions, metrics and datasets.

Anomaly Detection Deep Learning +1

2,223
0.23 stars / hour

Learning to Reason for Long-Form Story Generation

Alex-Gurung/ReasoningNCP 28 Mar 2025

Generating high-quality stories spanning thousands of tokens requires competency across a variety of skills, from tracking plot and character arcs to keeping a consistent and engaging style.

Form Math +1

50
0.22 stars / hour

PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

lizonghang/prima.cpp 7 Apr 2025

Emergency of DeepSeek R1 and QwQ 32B have broken through performance barriers for running frontier large language models (LLMs) on home devices.

Quantization

866
0.21 stars / hour

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

linkangheng/pr1 10 Apr 2025

In this work, we return to the fundamentals and explore the effects of RL on different perception tasks.

reinforcement-learning Reinforcement Learning +1

167
0.21 stars / hour

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

zihanwang314/ragen 24 Apr 2025

Training large language models (LLMs) as interactive agents presents unique challenges including long-horizon decision making and interacting with stochastic environment feedback.

Decision Making Reinforcement Learning (RL)

1,795
0.21 stars / hour