OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

aslp-lab/osum 23 Jan 2025

Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions.

Event Detection Gender Classification +3

116
0.88 stars / hour

Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

theworldofagents/agentic-reasoning 7 Feb 2025

We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents.

Decision Making Language Modeling +3

197
0.83 stars / hour

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

seal-rg/recurrent-pretraining 7 Feb 2025

We scale a proof-of-concept model to 3. 5 billion parameters and 800 billion tokens.

Language Modeling Language Modelling

581
0.82 stars / hour

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

foundationvision/flashvideo 7 Feb 2025

DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale.

Computational Efficiency Text-to-Video Generation +1

324
0.79 stars / hour

SCoralDet: Efficient real-time underwater soft coral detection with YOLO

RDXiaoLu/SCoralDet-Dataset journal 2024

To address these challenges, we propose SCoralDet, a soft coral detection model based on the YOLO architecture.

object-detection Object Detection

111
0.75 stars / hour

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

pku-alignment/align-anything 20 Dec 2024

In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.

Instruction Following

2,083
0.74 stars / hour

Cut Your Losses in Large-Vocabulary Language Models

unslothai/unsloth 13 Nov 2024

We implement a custom kernel that performs the matrix multiplications and the log-sum-exp reduction over the vocabulary in flash memory, making global memory consumption for the cross-entropy computation negligible.

30,665
0.73 stars / hour

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

getzep/graphiti 20 Jan 2025

We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.

RAG Retrieval

1,960
0.72 stars / hour

Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding

xid32/naacl_2025_twm 9 Feb 2025

To overcome these challenges, we introduce a specialized cognitive module, temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of MFMs.

Image Captioning Image-text Retrieval +5

149
0.66 stars / hour

MedRAX: Medical Reasoning Agent for Chest X-ray

bowang-lab/medrax 4 Feb 2025

Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care.

AI Agent Management

440
0.61 stars / hour