Inspired by optimization techniques, we propose a novel meta-learning algorithm with gradient modulation to encourage fast-adaptation of neural networks in the absence of abundant data.
Inspired by the concept of preconditioning, we propose a novel method to increase adaptation speed for gradient-based meta-learning methods without incurring extra parameters.
Single image-level annotations only correctly describe an often small subset of an image's content, particularly when complex real-world scenes are depicted.
In this paper, we present and study a new image segmentation task, called Generalized Open-set Semantic Segmentation (GOSS).
We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames.
Humans have the ability to accumulate knowledge of new tasks in varying conditions, but deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
In this concept, the 3D segmentation network learns from dual reciprocal adversarial learning approaches.
A principle way of achieving few-shot learning is to realize a model that can rapidly adapt to the context of a given task.
To this end, we formulate the metric as a weighted sum on the tangent bundle of the hyperbolic space and develop a mechanism to obtain the weights adaptively and based on the constellation of the points.
This paper presents a Transformer architecture for volumetric medical image segmentation.
Uncertainty estimation has been extensively studied in recent literature, which can usually be classified as aleatoric uncertainty and epistemic uncertainty.
A segmentation model cannot easily learn from prior information given in the visual tracking scenario.
Even with the luxury of having abundant data, multi-label classification is widely known to be a challenging task to address.
Our empirical evaluations show that the noise injecting operation does not degrade the performance of the NAS algorithm if the data is indeed clean.
Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks. The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing.
The proposed SLT-Net leverages on both short-term dynamics and long-term temporal consistency to detect concealed objects in continuous video frames.
The core concept of GNNs is to find a representation by recursively aggregating the representations of a central node and those of its neighbors.
Segmentation of images is a long-standing challenge in medical AI.
In this paper, we address the problem of Semi-Supervised DML (SSDML) that tries to learn a metric using a few labeled examples, and abundantly available unlabeled examples.
Recently, adversarial attack methods have been developed to challenge the robustness of machine learning models.
This in practice is done by minimizing the dissimilarity between current and previous responses of the network one way or another.
There exist several similarity measures for comparing SPD matrices with documented benefits.
Few-shot learning aims to correctly recognize query samples from unseen classes given a limited number of support samples, often by relying on global embeddings of images.
Neural networks suffer from catastrophic forgetting and are unable to sequentially learn new tasks without guaranteed stationarity in data distribution.
Few-shot class incremental learning (FSCIL) portrays the problem of learning new concepts gradually, where only a few examples per concept are available to the learner.
In this paper, we systematically discuss and define the two common types of label noise in medical images - disagreement label noise from inconsistency expert opinions and single-target label noise from wrong diagnosis record.
Catastrophic forgetting occurs when a neural network is trained sequentially on multiple tasks – its weights will be continuously modified and as a result, the network will lose its ability in solving a previous task.
However, working in hyperbolic spaces is not without difficulties as a result of its curved geometry (e. g., computing the Frechet mean of a set of points requires an iterative algorithm).
In this paper, we propose addressing this problem using a mixture of subspaces.
Existing deep neural network based salient object detection (SOD) methods mainly focus on pursuing high network accuracy.
Modern video person re-identification (re-ID) machines are often trained using a metric learning approach, supervised by a triplet loss.
To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation.
Ranked #2 on Stereo Disparity Estimation on Scene Flow
Full attention, which generates an attention value per element of the input feature maps, has been successfully demonstrated to be beneficial in visual tasks.
As obtaining class labels in all applications is not feasible, we propose an unsupervised approach that learns a metric without making use of class labels.
Learning to learn (L2L) trains a meta-learner to assist the learning of a task-specific base learner.
Deep neural networks need to make robust inference in the presence of occlusion, background clutter, pose and viewpoint variations -- to name a few -- when the task of person re-identification is considered.
We design a regularisation technique to regulate the domain adaptation.
In this paper, we revamp the forgotten classical Semi-Supervised Distance Metric Learning (SSDML) problem from a Riemannian geometric lens, to leverage stochastic optimization within a end-to-end deep framework.
This restricts their applicability for large datasets in new applications where obtaining labels require extensive manual efforts and domain knowledge.
Generalization from limited examples, usually studied under the umbrella of meta-learning, equips learning techniques with the ability to adapt quickly in dynamical environments and proves to be an essential aspect of lifelong learning.
We introduce the Neural Collaborative Subspace Clustering, a neural model that discovers clusters of data points drawn from a union of low-dimensional subspaces.
In this paper, we introduce a method that simultaneously learns an embedding space along subspaces within it to minimize a notion of reconstruction error, thus addressing the problem of subspace clustering in an end-to-end learning paradigm.
To achieve robust baselines, we build on a recent approach that aligns per-class scatter matrices of the source and target CNN streams.
In this paper, we extend some popular optimization algorithm to the Riemannian (constrained) setting.
Advanced optimization algorithms such as Newton method and AdaGrad benefit from second order derivative or second order statistics to achieve better descent directions and faster convergence rates.
Such supervision, while labor-intensive and not always possible, tends to hinder the generalization ability of the learned models.
State-of-the-art neural network models estimate large displacement optical flow in multi-resolution and use warping to propagate the estimation between two resolutions.
To achieve robust baselines, we build on a recent approach that aligns per-class scatter matrices of the source and target CNN streams .
Linear Dynamical Systems (LDSs) are fundamental tools for modeling spatio-temporal data in various disciplines.
Symmetric positive definite (SPD) matrices are useful for capturing second-order statistics of visual data.
Symmetric positive definite (SPD) matrices are useful for capturing second-order statistics of visual data.
To be tractable and robust to data noise, existing metric learning algorithms commonly rely on PCA as a pre-processing step.
Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity.
This paper introduces a learning scheme to construct a Hilbert space (i. e., a vector space along its inner product) to address both unsupervised and semi-supervised domain adaptation problems.
This paper introduces an extension of the backpropagation algorithm that enables us to have layers with constrained weights in a deep network.
We then devise efficient algorithms to perform sparse coding and dictionary learning on the space of infinite-dimensional subspaces.
To enhance the performance of LDSs, in this paper, we address the challenging issue of performing sparse coding on the space of LDSs, where both data and dictionary atoms are LDSs.
This lets us formulate dimensionality reduction as the problem of finding a projection that yields a low-dimensional manifold either with maximum discriminative power in the supervised scenario, or with maximum variance of the data in the unsupervised one.
Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation.
State-of-the-art image-set matching techniques typically implicitly model each image-set with a Gaussian distribution.
Vectors of Locally Aggregated Descriptors (VLAD) have emerged as powerful image/video representations that compete with or even outperform state-of-the-art approaches on many challenging visual recognition tasks.
While sparse coding on non-flat Riemannian manifolds has recently become increasingly popular, existing solutions either are dedicated to specific manifolds, or rely on optimization problems that are difficult to solve, especially when it comes to dictionary learning.
We tackle the problem of optimizing over all possible positive definite radial kernels on Riemannian manifolds for classification.
We propose a framework for 2D shape analysis using positive definite kernels defined on Kendall's shape manifold.
To encode the geometry of the manifold in the mapping, we introduce a family of provably positive definite kernels on the Riemannian manifold of SPD matrices.
We then use the proposed framework to identify positive definite kernels on two specific manifolds commonly encountered in computer vision: the Riemannian manifold of symmetric positive definite matrices and the Grassmann manifold, i. e., the Riemannian manifold of linear subspaces of a Euclidean space.
This paper introduces sparse coding and dictionary learning for Symmetric Positive Definite (SPD) matrices, which are often used in machine learning, computer vision and related areas.
In contrast, here, we study the problem of performing coding in a high-dimensional Hilbert space, where the classes are expected to be more easily separable.
We introduce an approach to computing and comparing Covariance Descriptors (CovDs) in infinite-dimensional spaces.
The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately.
With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds.
Recent advances in computer vision and machine learning suggest that a wide range of problems can be addressed more appropriately by considering non-Euclidean geometry.