Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions.
We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents.
We scale a proof-of-concept model to 3. 5 billion parameters and 800 billion tokens.
DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale.
To address these challenges, we propose SCoralDet, a soft coral detection model based on the YOLO architecture.
Ranked #1 on
2D Object Detection
on SCoralDet Dataset
In this work, we make the first attempt to fine-tune all-modality models (i. e. input and output with any modality, also named any-to-any models) using human preference data across all modalities (including text, image, audio, and video), ensuring its behavior aligns with human intentions.
We implement a custom kernel that performs the matrix multiplications and the log-sum-exp reduction over the vocabulary in flash memory, making global memory consumption for the cross-entropy computation negligible.
We introduce Zep, a novel memory layer service for AI agents that outperforms the current state-of-the-art system, MemGPT, in the Deep Memory Retrieval (DMR) benchmark.
To overcome these challenges, we introduce a specialized cognitive module, temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of MFMs.
Chest X-rays (CXRs) play an integral role in driving critical decisions in disease management and patient care.