In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.
Ranked #12 on Real-Time Object Detection on COCO
(3) the data uncertainty and the model uncertainty are jointly learned in a unified network, and they serve as two fundamental criteria for the reliability assessment: if a probe is high-quality (low data uncertainty) and the model is confident in the prediction of the probe (low model uncertainty), the final ranking will be assessed as reliable.
To address this issue, we propose the Ranking-based Backward Compatible Learning (RBCL), which directly optimizes the ranking metric between new features and old features.
In TAGPerson, we extract information from target scenes and use them to control our parameterized rendering process to generate target-aware synthetic images, which would hold a smaller gap to the real images in the target domain.
no code implementations • 17 Nov 2021 • Ming Yan, Haiyang Xu, Chenliang Li, Junfeng Tian, Bin Bi, Wei Wang, Weihua Chen, Xianzhe Xu, Fan Wang, Zheng Cao, Zhicheng Zhang, Qiyu Zhang, Ji Zhang, Songfang Huang, Fei Huang, Luo Si, Rong Jin
The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image.
Ranked #11 on Visual Question Answering on VQA v2 test-dev
Due to the discrepancies between cameras caused by illumination, background, or viewpoint, the underlying difficulty for Re-ID is the camera bias problem, which leads to the large gap of within-identity features from different cameras.
Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively.
Ranked #1 on Domain Adaptation on Office-31
Recently, GAN based method has demonstrated strong effectiveness in generating augmentation data for person re-identification (ReID), on account of its ability to bridge the gap between domains and enrich the data variety in feature space.
We observe that these proposed schemes are capable of facilitating the learning of discriminative feature representations.
In this work, we propose a graph-based re-ranking method to improve learned features while still keeping Euclidean distance as the similarity metric.
We mainly focus on four points, i. e. training data, unsupervised domain-adaptive (UDA) training, post-processing, model ensembling in this challenge.
Multi-Target Multi-Camera Tracking has a wide range of applications and is the basis for many advanced inferences and predictions.
Considering the large gap between the source domain and target domain, we focused on solving two biases that influenced the performance on domain adaptive pedestrian Re-ID and proposed a two-stage training procedure.
Our solution is based on a strong baseline with bag of tricks (BoT-BS) proposed in person ReID.
In particular, a quadruplet deep network using a margin-based online hard negative mining is proposed based on the quadruplet loss for the person ReID.
Person re-identification (ReID) focuses on identifying people across different scenes in video surveillance, which is usually formulated as a binary classification task or a ranking task in current person ReID approaches.
Non-overlapping multi-camera visual object tracking typically consists of two steps: single camera object tracking and inter-camera object tracking.