Artificial intelligence (AI) has achieved astonishing successes in many domains, especially with the recent breakthroughs in the development of foundational large models.
While Transformer self-attention offers strong parallelism, the Key-Value (KV) cache grows linearly with sequence length and becomes a bottleneck for inference efficiency.
Recent advancements in Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving by leveraging world knowledge and reasoning capabilities.
Ranked #6 on
Bench2Drive
on Bench2Drive
This paper proposes Protoformer, a novel self-learning framework for Transformers that can leverage problematic samples for text classification.
Ranked #1 on
Text Classification
on arXiv-10
The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence.
Ranked #1 on
class-incremental learning
on cifar100
This paper presents AquaSignal, a modular and scalable pipeline for preprocessing, denoising, classification, and novelty detection of underwater acoustic signals.
We deeply explore the main contributors of OOD detection and find that reconstruction-based pretext tasks have the potential to provide a generally applicable and efficacious prior, which benefits the model in learning intrinsic data distributions of the ID dataset.
Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems.
Ranked #2 on
Image Generation
on WISE
We introduce TransDiff, the first image generation model that marries Autoregressive (AR) Transformer with diffusion models.
Second, to address data deficit, we introduce OLIWER, a large-scale online writer retrieval dataset encompassing over 670, 000 Chinese handwritten phrases from 1, 731 individuals.