We propose probabilistic task modelling -- a generative probabilistic model for collections of tasks used in meta-learning.
To promote the new task of camouflaged instance segmentation in-the-wild, we introduce a new dataset, namely CAMO++, by extending our preliminary CAMO dataset (camouflaged object segmentation) in terms of quantity and diversity.
With this approach, we can learn activation quantizers that minimize the quantization errors in the majority of channels.
Different approaches have been proposed to Visual Question Answering (VQA).
Based on pseudo labels, we propose a novel unsupervised metric loss which enforces the positive concentration and negative separation of samples in the embedding space.
This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval.
Differently from existing networks for segmentation, our proposed network possesses two segmentation streams: the main stream and the mirror stream corresponding with the original image and its flipped image, respectively.
Ranked #2 on Camouflaged Object Segmentation on CAMO
In HAR, the development of Activity Recognition models is dependent upon the data captured by these devices and the methods used to analyse them, which directly affect performance metrics.
We introduce a new and rigorously-formulated PAC-Bayes few-shot meta-learning algorithm that implicitly learns a prior distribution of the model of interest.
Traditional approaches for Visual Question Answering (VQA) require large amount of labeled data for training.
Ranked #2 on Medical Visual Question Answering on VQA-RAD
In Visual Question Answering (VQA), answers have a great correlation with question meaning and visual contents.
Ranked #2 on Visual Question Answering on TDIUC
Our experiments show that, compared to state-of-the-art techniques, our method has much greater potential for large-scale place recognition for autonomous driving.
We introduce a new, rigorously-formulated Bayesian meta-learning algorithm that learns a probability distribution of model parameter prior for few-shot learning.
Deep learning models have demonstrated outstanding performance in several problems, but their training process tends to require immense amounts of computational and human resources for training and labeling, constraining the types of problems that can be tackled.
This global vector is then subjected to a hashing function to generate a binary hash code.
Another approach explored in the field relies on an ad-hoc linearization (in terms of N) of the triplet loss that introduces class centroids, which must be optimized using the whole training set for each mini-batch - this means that a naive implementation of this approach has run-time complexity O(N^2).
In particular, our work enables the use of randomized methods for point cloud registration without the need of putative correspondences.
We propose V2CNet, a new deep learning framework to automatically translate the demonstration videos to commands that can be directly used in robotic applications.
In this paper, we present a novel 6-DOF localization system that for the first time simultaneously achieves all the three characteristics: significantly sub-linear storage growth, agnosticism to image descriptors, and customizability to available storage and computational resources.
As the post-processing step for object detection, non-maximum suppression (GreedyNMS) is widely used in most of the detectors for many years.
Our approaches rely on local features with an encoding technique to represent an image as a single vector.
In this paper, we present the Predicting Media Memorability task, which is proposed as part of the MediaEval 2018 Benchmarking Initiative for Multimedia Evaluation.
This document describes G2D, a software that enables capturing videos from Grand Theft Auto V (GTA V), a popular role playing game set in an expansive virtual city.
In this paper, we present a novel open-set semantic instance segmentation approach capable of segmenting all known and unknown object classes in images, based on the output of an object detector trained on known object classes.
Detecting objects and their 6D poses from only RGB images is an important task for many robotic applications.
However, training deep hashing networks for the task is challenging due to the binary constraints on the hash codes, the similarity preserving property, and the requirement for a vast amount of labelled images.
For unsupervised data-dependent hashing, the two most important requirements are to preserve similarity in the low-dimensional feature space and to minimize the binary quantization loss.
In order to overcome the resource constraints of mobile devices, we propose a system design that leverages the scalability advantage of image retrieval and accuracy of 3D model-based localization.
In the large-scale image retrieval task, the two most important requirements are the discriminability of image representations and the efficiency in computation and storage of representations.
This design has overcome a challenging problem in some previous works: optimizing non-smooth objective functions because of binarization.
Image hashing is a popular technique applied to large scale content-based visual retrieval due to its compact and efficient binary codes.
This paper presents SceneCut, a novel approach to jointly discover previously unseen objects and non-object surfaces using a single RGB-D image.
We propose AffordanceNet, a new deep learning approach to simultaneously detect multiple objects and their affordances from RGB images.
Our method first creates the event image from a list of events that occurs in a very short time interval, then a Stacked Spatial LSTM Network (SP-LSTM) is used to learn the camera pose.
This paper proposes two approaches for inferencing binary codes in two-step (supervised, unsupervised) hashing.
The objective of this paper is to design an embedding method that maps local features describing an image (e. g. SIFT) to a higher dimensional representation useful for the image retrieval problem.
We address the vehicle detection and classification problems using Deep Neural Networks (DNNs) approaches.
This paper addresses the problem of learning binary hash codes for large scale image search by proposing a novel hashing method based on deep neural network.
The embedded vectors resulted by the function approximation process are then aggregated to form a single representation used in the image retrieval framework.