To exploit the progressive interactions among these regions, we represent them as a region graph, on which the parts relation reasoning is performed with graph convolutions, thus leading to our PRR branch.
Non-linear activation functions, e. g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs).
Learning from the web can ease the extreme dependence of deep learning on large-scale manually labeled datasets.
To further mine the non-salient region objects, we propose to exert the segmentation network's self-correction ability.
Due to the memorization effect in Deep Neural Networks (DNNs), training with noisy labels usually results in inferior model performance.
Then we utilize the fused prototype to guide the final segmentation of the query image.
Labeling objects at a subordinate level typically requires expert knowledge, which is not always available when using random annotators.
Specifically, we first generate N pairs (key and value) of multi-resolution query features guided by the support feature and its mask.
no code implementations • 29 Dec 2020 • Xiu-Shen Wei, Yu-Yan Xu, Yazhou Yao, Jia Wei, Si Xi, Wenyuan Xu, Weidong Zhang, Xiaoxin Lv, Dengpan Fu, Qing Li, Baoying Chen, Haojie Guo, Taolue Xue, Haipeng Jing, Zhiheng Wang, Tianming Zhang, Mingwen Zhang
WebFG 2020 is an international challenge hosted by Nanjing University of Science and Technology, University of Edinburgh, Nanjing University, The University of Adelaide, Waseda University, etc.
We present a model that utilizes linear models with variance and low-rank constraints, to help it generalize better and reduce the number of parameters.
To this end, we propose a certainty-based reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images.
To this end, we propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
Despite significant progress of applying deep learning methods to the field of content-based image retrieval, there has not been a software library that covers these methods in a unified manner.
In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation.
Ranked #6 on Unsupervised Video Object Segmentation on DAVIS 2016 (using extra training data)
Recent successes in visual recognition can be primarily attributed to feature representation, learning algorithms, and the ever-increasing size of labeled training data.
To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation.
We project the LiDAR point clouds onto the image plane to generate LiDAR images and feed them into one of the branches of the network.
To eliminate manual annotation, in this work, we propose a novel image dataset construction framework by employing multiple textual queries.
To tackle these problems, in this work, we exploit general corpus information to automatically select and subsequently classify web images into semantic rich (sub-)categories.
To reduce the cost of manual labelling, there has been increased research interest in automatically constructing image datasets by exploiting web images.