The inter-device communication of a MoE layer can occupy 47% time of the entire model execution with popular models and frameworks.
We study self-rewarding reasoning large language models (LLMs), which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback.
Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs).
We present Attentive Reasoning Queries (ARQs), a novel structured reasoning approach that significantly improves instruction-following in Large Language Models through domain-specialized reasoning blueprints.
Enhancing reasoning in Large Multimodal Models (LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment.
The entire stack is open source, so that any user of a unix-like OS can run the world's first chatbot on the world's first time-sharing system.
To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health.
Our modified model architecture and training recipe achieve both better training stability and improved per-token efficiency.
Recent developments in genomic language models have underscored the potential of LLMs in deciphering DNA sequences.
Video inpainting, which aims to restore corrupted video content, has experienced substantial progress.