A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

deepseek-ai/DeepEP 10 Mar 2025

Artificial intelligence (AI) has achieved astonishing successes in many domains, especially with the recent breakthroughs in the development of foundational large models.

Continual Learning Meta-Learning +2

8,186
0.47 stars / hour

Multi-head Temporal Latent Attention

d-keqi/mlta 19 May 2025

While Transformer self-attention offers strong parallelism, the Key-Value (KV) cache grows linearly with sequence length and becomes a bottleneck for inference efficiency.

speech-recognition Speech Recognition +1

295
0.47 stars / hour

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

ucla-mobility/AutoVLA 16 Jun 2025

Recent advancements in Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving by leveraging world knowledge and reasoning capabilities.

Action Generation Bench2Drive +3

67
0.45 stars / hour

Protoformer: Embedding Prototypes for Transformers

codelion/adaptive-classifier PAKDD 2022: Advances in Knowledge Discovery and Data Mining 2022

This paper proposes Protoformer, a novel self-learning framework for Transformers that can leverage problematic samples for text classification.

General Classification Language Modelling +5

280
0.44 stars / hour

Overcoming catastrophic forgetting in neural networks

codelion/adaptive-classifier 2 Dec 2016

The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence.

Atari Games class-incremental learning +3

280
0.44 stars / hour

AquaSignal: An Integrated Framework for Robust Underwater Acoustic Analysis

codelion/adaptive-classifier 20 May 2025

This paper presents AquaSignal, a modular and scalable pipeline for preprocessing, denoising, classification, and novelty detection of underwater acoustic signals.

Denoising Novelty Detection

280
0.44 stars / hour

Logits-Based Finetuning

dvlab-research/logits-based-finetuning 30 May 2025

We deeply explore the main contributors of OOD detection and find that reconstruction-based pretext tasks have the potential to provide a generally applicable and efficacious prior, which benefits the model in learning intrinsic data distributions of the ID dataset.

Out of Distribution (OOD) Detection

85
0.38 stars / hour

Emerging Properties in Unified Multimodal Pretraining

ByteDance-Seed/Bagel 20 May 2025

Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems.

Image Editing +3

4,265
0.37 stars / hour

Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression

transdiff/transdiff 11 Jun 2025

We introduce TransDiff, the first image generation model that marries Autoregressive (AR) Transformer with diffusion models.

Image Generation

73
0.37 stars / hour

Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach

scut-dlvclab/dolphin 16 Dec 2024

Second, to address data deficit, we introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670, 000 Chinese handwritten phrases from 1, 731 individuals.

Representation Learning Retrieval +1

36
0.35 stars / hour