In this paper, we propose a novel deep learning architecture to improving word-level lip-reading.
Inspired by this insight, we propose to use class and subitizing labels as weak supervision for the SID problem.
We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE).
This paper presents a novel person re-identification model, named Multi-Head Self-Attention Network (MHSA-Net), to prune unimportant information and capture key local information from person images.
The CBDB-Net contains two novel designs: the Consecutive Batch DropBlock Module (CBDBM) and the Elastic Loss (EL).
The purpose of this paper is to provide a comprehensive survey on deep learning-based approaches in traffic prediction from multiple perspectives.
Recently, single-image super-resolution has made great progress owing to the development of deep convolutional neural networks (CNNs).
Inspired by the recent advance of image-based object reconstruction using deep learning, we present an active reconstruction model using a guided view planner.
In this paper, we propose a novel deep fully convolutional network model for accurate salient object detection.
Ranked #5 on Saliency Detection on DUT-OMRON
Under this expression, the projection base of the model is based on the tensor CandeComp/PARAFAC (CP) decomposition and the number of free parameters in the model only grows linearly with the number of modes rather than exponentially.
In the second stage, FIN is fine-tuned with its predicted saliency maps as ground truth.
Learning on Grassmann manifold has become popular in many computer vision tasks, with the strong capability to extract discriminative information for imagesets and videos.
Restricted Boltzmann Machine (RBM) is a particular type of random neural network models modeling vector data based on the assumption of Bernoulli distribution.
Partial least squares regression (PLSR) has been a popular technique to explore the linear relationship between two datasets.
In multi-camera video surveillance, it is challenging to represent videos from different cameras properly and fuse them efficiently for specific applications such as human activity recognition and clustering.
In this paper, we propose a new sparse model TenSR based on tensor for MD data representation along with the corresponding MD sparse coding and MD dictionary learning algorithms.
As a significant subspace clustering method, low rank representation (LRR) has attracted great attention in recent years.
The novelty of this paper is to generalize LRR on Euclidean space onto an LRR model on Grassmann manifold in a uniform kernelized LRR framework.
In a sparse representation based recognition scheme, it is critical to learn a desired dictionary, aiming both good representational power and discriminative performance.
A new submodule clustering method via sparse and low-rank representation for multi-way data is proposed in this paper.
One of its successful applications is subspace clustering which means data are clustered according to the subspaces they belong to.
In contrast to existing techniques, we propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model.