Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i. e., structural relation graph and semantic relation graph.
We study the problem of weakly supervised grounded image captioning.
The Deep Neural Networks are vulnerable toadversarial exam-ples(Figure 1), making the DNNs-based systems collapsed byadding the inconspicuous perturbations to the images.
However, the performance of existing methods suffers in real life since the user is likely to provide an incomplete description of an image, which often leads to results filled with false positives that fit the incomplete description.
Secondly, a BERT with locality-constrained attention is proposed to obtain representations of descriptions at different scales.
Ranked #5 on Text based Person Retrieval on CUHK-PEDES
Current training objectives of existing person Re-IDentification (ReID) models only ensure that the loss of the model decreases on selected training batch, with no regards to the performance on samples outside the batch.
Through extensive experiments, we demonstrate that SWP is more effective compared to the previous FP-based methods and achieves the state-of-art pruning ratio on CIFAR-10 and ImageNet datasets without obvious accuracy drop.
Specifically, we construct a positive clip and a negative clip for each video.
Secondly, the Conditional Feature Embedding requires the overall feature of a query image to be dynamically adjusted based on the gallery image it matches, while most of the existing methods ignore the reference images.
Ranked #2 on Person Re-Identification on CUHK03-C
In the conventional person Re-ID setting, it is widely assumed that cropped person images are for each individual.
Greedy-NMS inherently raises a dilemma, where a lower NMS threshold will potentially lead to a lower recall rate and a higher threshold introduces more false positives.
Ranked #4 on Object Detection on CrowdHuman (full body)
This paper proposes an adaptive energy management strategy for hybrid electric vehicles by combining deep reinforcement learning (DRL) and transfer learning (TL).
However, the detection of oriented and densely packed objects remains challenging because of following inherent reasons: (1) receptive fields of neurons are all axis-aligned and of the same shape, whereas objects are usually of diverse shapes and align along various directions; (2) detection models are typically trained with generic knowledge and may not generalize well to handle specific objects at test time; (3) the limited dataset hinders the development on this task.
Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level.
Ranked #5 on Person Re-Identification on Market-1501 (using extra training data)
This procedure encourages that the selected training samples can be both clean and miscellaneous, and that the two models can promote each other iteratively.
Ranked #8 on Unsupervised Domain Adaptation on Market to Duke
Recently, the research interest of person re-identification (ReID) has gradually turned to video-based methods, which acquire a person representation by aggregating frame features of an entire video.
In this paper, we address the problem of monocular depth estimation when only a limited number of training image-depth pairs are available.
To overcome this problem, we propose a deep model for the soft multilabel learning for unsupervised RE-ID.
Ranked #71 on Person Re-Identification on DukeMTMC-reID
Most existing Re-IDentification (Re-ID) methods are highly dependent on precise bounding boxes that enable images to be aligned with each other.
Ranked #3 on Person Re-Identification on CUHK03-C
While attributes have been widely used for person re-identification (Re-ID) which aims at matching the same person images across disjoint camera views, they are used either as extra features or for performing multi-task learning to assist the image-image matching task.
With the rapid increase of transnational communication and cooperation, people frequently encounter multilingual scenarios in various situations.