The Landsat program is the longest-running Earth observation program in history, with 50+ years of data acquisition by 8 satellites.
We develop and demonstrate ClimaX, a flexible and generalizable deep learning model for weather and climate science that can be trained using heterogeneous datasets spanning different variables, spatio-temporal coverage, and physical groundings.
By enhancing the semantic ability of the codec, X-Codec significantly reduces WER in speech synthesis tasks and extends these benefits to non-speech applications, including music and sound generation.
By incorporating this dataset into model training, we successfully scale the output length of existing models to over 10, 000 words while maintaining output quality.
We present the "Law of Vision Representation" in multimodal large language models (MLLMs).
It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).
Ranked #1 on Camouflaged Object Segmentation on COD
Camouflaged Object Segmentation Dichotomous Image Segmentation +3
For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.
Furthermore, we have validated the model's fast adaptation ability and scaling law emergence, showcasing its versatility.
We find that the distilled model, termed LinFusion, achieves performance on par with or superior to the original SD after only modest training, while significantly reducing time and memory complexity.
To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i. e., using screenshots as input and keyboard and mouse actions as output.