To utilize the abundant visual priors in the off-the-shelf T2I models, a series of methods try to invert an image to proper embedding that aligns with the semantic space of the T2I model.
Second, we propose an align-guided contrastive loss to refine the alignment of vision and text embeddings.
to provide additional consistency constraints, which grows GPU memory consumption and complicates the model's structure and training pipeline.
To this end, we propose to extract features corresponding to regional objects as soft prompts for LLM, which provides a straightforward and scalable approach and eliminates the need for LLM fine-tuning.
To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously.
To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points.
In this paper, we propose an end-to-end framework for oriented object detection, which simplifies the model pipeline and obtains superior performance.
In this work, we propose a novel and data-efficient framework for WILSS, named FMWISS.
Instead of relabeling each dataset with the unified taxonomy, a category-guided decoding module is designed to dynamically guide predictions to each datasets taxonomy.
Specifically, MimCo takes a pre-trained contrastive learning model as the teacher model and is pre-trained with two types of learning targets: patch-level and image-level reconstruction losses.
To tackle this problem, we propose a purely angle-free framework for rotated object detection, called Point RCNN, which mainly consists of PointRPN and PointReg.
To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$.
Ranked #12 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)
Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification.
On the COCO dataset, our simple design achieves superior performance compared to both the FCOS baseline detector with NMS post-processing and the recent end-to-end NMS-free detectors.
However, it remains challenging to determine which method is suitable for a given application since they are built with certain priors or bias.
In this paper, we propose a novel Dynamic Adversarial Adaptation Network (DAAN) to dynamically learn domain-invariant representations while quantitatively evaluate the relative importance of global and local domain distributions.
It is able to achieve accurate and personalized healthcare without compromising privacy and security.
In this paper, we propose a unified Transfer Channel Pruning (TCP) approach for accelerating UDA models.