In spite of the success of pre-trained language model in many NLP tasks, the learned text representation only contains the correlation among the words in the sentence itself and ignores the implicit relationship between arbitrary tokens in the sequence.
In this work, we introduce another aspect of adaptiveness in the loss function, namely the image quality.
Ranked #1 on Surveillance-to-Single on IJB-S
Most face relighting methods are able to handle diffuse shadows, but struggle to handle hard shadows, such as those cast by the nose.
That is, a template protected real image, and its manipulated version, is better discriminated compared to the original real image vs. its manipulated one.
However, carefully crafted, tiny adversarial perturbations are difficult to recover by optimizing a unilateral RED objective.
The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities.
With complementary supervision from both 3D detection and reconstruction, one enables the 3D voxel features to be geometry and context preserving, benefiting both tasks. The effectiveness of our approach is demonstrated through 3D detection and reconstruction in single object and multiple object scenarios.
Recent advances in generative adversarial networks (GANs) have led to remarkable achievements in face image synthesis.
A distinctive feature of Doppler radar is the measurement of velocity in the radial direction for radar points.
To tackle this problem, we propose a framework with two components: a Fingerprint Estimation Network (FEN), which estimates a GM fingerprint from a generated image by training with four constraints to encourage the fingerprint to have desired properties, and a Parsing Network (PN), which predicts network architecture and loss functions from the estimated fingerprints.
Here we propose a radar-to-pixel association stage which learns a mapping from radar returns to pixels.
This paper presents a method for riggable 3D face reconstruction from monocular images, which jointly estimates a personalized face rig and per-image parameters including expressions, poses, and illuminations.
Proposed UniFAD outperforms prevailing defense methods and their fusion with an overall TDR = 94. 73% @ 0. 2% FDR on a large fake face dataset consisting of 341K bona fide images and 448K attack images of 25 types across all 3 categories.
Furthermore, we introduce a method to use the shadow mask to estimate the ambient light intensity in an image, and are thus able to leverage multiple datasets during training with different global lighting intensities.
That is, for a 2D image of a generic object, we decompose it into latent representations of category, shape and albedo, lighting and camera projection matrix, decode the representations to segmented 3D shape and albedo respectively, and fuse these components to render an image well approximating the input image.
In this paper, we present and integrate GrooMeD-NMS -- a novel Grouped Mathematically Differentiable NMS for monocular 3D object detection, such that the network is trained end-to-end with a loss on the boxes after NMS.
Ranked #11 on Monocular 3D Object Detection on KITTI Cars Moderate
To defend against manipulation of image content, such as splicing, copy-move, and removal, we develop a Progressive Spatio-Channel Correlation Network (PSCC-Net) to detect and localize image manipulations.
While using the trending graph neural networks (GNNs) as encoder has the problem that GNNs aggregate redundant information from neighborhood and generate indistinguishable user representations, which is known as over-smoothing.
Compared to the previous state-of-the-art learning algorithms for non-rigid registration of face scans, SMF only requires the raw data to be rigidly aligned (with scaling) with a pre-defined face template.
Additive process describes spoofing as spoof material introducing extra patterns (e. g., moire pattern), where the live counterpart can be recovered by removing those patterns.
To tackle this research gap, we propose a novel duet representation learning framework named \sysname to fuse local information (user-item interaction data) and global information (external knowledge graph) for the top-$N$ recommendation, which is composed of two separate sub-models.
During training, FaceGuard automatically synthesizes challenging and diverse adversarial attacks, enabling a classifier to learn to distinguish them from real faces and a purifier attempts to remove the adversarial perturbations in the image space.
Robotic apple harvesting has received much research attention in the past few years due to growing shortage and rising cost in labor.
In this work, we propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
Ranked #9 on Monocular 3D Object Detection on KITTI Cars Moderate (using extra training data)
Prior studies show that the key to face anti-spoofing lies in the subtle image pattern, termed "spoof trace", e. g., color distortion, 3D mask edge, Moire pattern, and many others.
Our proposed group adaptive classifier mitigates bias by using adaptive convolution kernels and attention mechanisms on faces based on their demographic attributes.
In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities.
Ranked #1 on Face Alignment on MERL-RAV
As an emerging topic in face recognition, designing margin-based loss functions can increase the feature margin between different classes for enhanced discriminability.
Ranked #10 on Face Verification on IJB-C (TAR @ FAR=1e-4 metric)
In this work we study the mutual benefits of two common computer vision tasks, self-supervised depth estimation and semantic segmentation from images.
Using printed photograph and replaying videos of biometric modalities, such as iris, fingerprint and face, are common attacks to fool the recognition systems for granting access as the genuine user.
Graph convolution network (GCN) attracts intensive research interest with broad applications.
To improve the performance on those hard samples for general tasks, we propose a novel Distribution Distillation Loss to narrow the performance gap between easy and hard samples, which is a simple, effective and generic for various types of facial variations.
FAN can leverage both paired and unpaired data as we disentangle the features into identity and non-identity components and adapt the distribution of the identity features, which breaks the limit of current face super-resolution methods.
Our results show that object detection can help improve the accuracy of some skin disease classes.
We address the problem of bias in automated face recognition and demographic attribute estimation algorithms, where errors are lower on certain cohorts belonging to specific demographic groups.
Instead of simply using multi-task learning to simultaneously detect manipulated images and predict the manipulated mask (regions), we propose to utilize an attention mechanism to process and improve the feature maps for the classification task.
The LSTM integrates pose features over time as a dynamic gait feature while canonical features are averaged as a static gait feature.
Understanding the world in 3D is a critical component of urban autonomous driving.
Ranked #15 on Vehicle Pose Estimation on KITTI Cars Hard
Most of the existing gait recognition methods take silhouettes or articulated body models as the gait features.
By improving the nonlinear 3D morphable model in both learning objective and network architecture, we present a model which is superior in capturing higher level of details than the linear or its precedent nonlinear counterparts.
We also show that the standard Mean Squared Error (MSE) loss function can promote depth mixing, and thus propose instead to use cross-entropy loss for DC.
In recent years, heatmap regression based models have shown their effectiveness in face alignment and pose estimation.
To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans.
In this work, motivated by the noise modeling and denoising algorithms, we identify a new problem of face de-spoofing, for the purpose of anti-spoofing: inversely decomposing a spoof face into a spoof noise and a live face, and then utilizing the spoof noise for classification.
As a classic statistical model of 3D facial shape and texture, 3D Morphable Model (3DMM) is widely used in facial analysis, e. g., model fitting, image synthesis.
Ranked #1 on Face Alignment on AFLW2000
In this paper, we propose to tackle these three challenges in an new alignment framework termed 3D Dense Face Alignment (3DDFA), in which a dense 3D Morphable Model (3DMM) is fitted to the image via Cascaded Convolutional Neural Networks.
Ranked #3 on Face Alignment on AFLW
This paper proposes an encoder-decoder network to disentangle shape features during 3D face reconstruction from single 2D images, such that the tasks of reconstructing accurate 3D face shapes and learning discriminative shape features for face recognition can be accomplished simultaneously.
In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.
Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels.
We present a novel deep end-to-end trainable Face Super-Resolution Network (FSRNet), which makes full use of the geometry prior, i. e., facial landmark heatmaps and parsing maps, to super-resolve very low-resolution (LR) face images without well-aligned requirement.
We present visual-analytics methods to reveal and analyze this hierarchy of similar classes in relation with CNN-internal data.
This paper presents an automated monocular-camera-based computer vision system for autonomous self-backing-up a vehicle towards a trailer, by continuously estimating the 3D trailer coupler position and feeding it to the vehicle control system, until the alignment of the tow hitch with the trailers coupler.
Extensive experiments show that the proposed method can achieve the state-of-the-art accuracy in both face alignment and 3D face reconstruction, and benefit face recognition owing to its reconstructed PEN 3D face.
We apply MemNet to three image restoration tasks, i. e., image denosing, super-resolution and JPEG deblocking.
Specifically, residual learning is adopted, both in global and local manners, to mitigate the difficulty of training very deep networks; recursive learning is used to control the model parameters while increasing the depth.
To leverage the valuable information in the corrupted data, we propose to impute the missing data by leveraging the relatedness among different modalities.
When placed properly, the additional supervision helps guide features in shared layers to become more sophisticated and helpful for the downstream pedestrian detector.
Ranked #18 on Pedestrian Detection on Caltech
First, the encoder-decoder structure of the generator enables DR-GAN to learn a representation that is both generative and discriminative, which can be used for face image synthesis and pose-invariant face recognition.
Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations in unconstrained environments.
RNN-based approaches have achieved outstanding performance on action recognition with skeleton inputs.
Ranked #1 on Skeleton Based Action Recognition on SBU
First, we propose a multi-task Convolutional Neural Network (CNN) for face recognition where identity classification is the main task and pose, illumination, and expression estimations are the side tasks.
Large-pose face alignment is a very challenging problem in computer vision, which is used as a prerequisite for many important vision tasks, e. g, face recognition and 3D face reconstruction.
Given a collection of "in-the-wild" face images captured under a variety of unknown pose, expression, and illumination conditions, this paper presents a method for reconstructing a 3D face surface model of an individual along with albedo information.
Global motion compensation (GMC) removes the impact of camera motion and creates a video in which the background appears static over the progression of time.
Face alignment, which fits a face model to an image and extracts the semantic meanings of facial pixels, has been an important topic in CV community.
Ranked #3 on 3D Face Reconstruction on Florence
Second, by leveraging emerging face alignment techniques and our novel normal field-based Laplace editing, a combination of landmark constraints and photometric stereo-based normals drives our surface reconstruction.
First, leaf segmentation and alignment are applied on the last frame of a plant video to find a number of well-aligned leaf candidates.
Overlap is one of the characteristics of social networks, in which a person may belong to more than one social group.
Social and Information Networks Data Structures and Algorithms Physics and Society