For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information.
Ranked #1 on 2D Semantic Segmentation on xBD
For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.
Inspired by these challenges, this paper presents AIOS, an LLM agent operating system, which embeds large language model into operating systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI.
Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence.
We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA.
To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.
Ranked #1 on Zero-Shot Object Detection on MSCOCO
Prompt compression is an innovative method for efficiently condensing input prompts while preserving essential information.
For incorrectly predicted samples, our method achieves gains of 81. 0% and 18. 4% compared to the HSIC-Attribution algorithm in the average highest confidence and Insertion score respectively.
Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on.