We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.
When enhancing low-light images, many deep learning algorithms are based on the Retinex theory.
Ranked #1 on
Low-Light Image Enhancement
on LOL-v2-synthetic
To address these challenges, we introduce a system that can jointly optimize distributed execution and gradient checkpointing plans.
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.
In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection.
Ranked #3 on
Object Detection
on COCO 2017 val
Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM).
Since the modern deep neural networks are of sophisticated design with complex architecture for the accuracy reason, the diversity on distributions of weights and activations is very high.
We present a new learning paradigm in which the knowledge extracted from large pre-trained models are utilized to help models like CNN and ViT learn enhanced representations and achieve better performance.
In practice, only a small portion of the original training set is required as positive examples and more useful training examples can be obtained from the massive unlabeled data on the cloud through a PU classifier with an attention based multi-scale feature extractor.
The current interacting hand (IH) datasets are relatively simplistic in terms of background and texture, with hand joints being annotated by a machine annotator, which may result in inaccuracies, and the diversity of pose distribution is limited.