The adversarial attack literature contains a myriad of algorithms for crafting perturbations which yield pathological behavior in neural networks.
Face super-resolution is a challenging and highly ill-posed problem since a low-resolution (LR) face image may correspond to multiple high-resolution (HR) ones during the hallucination process and cause a dramatic identity change for the final super-resolved results.
Manipulated videos, especially those where the identity of an individual has been modified using deep neural networks, are becoming an increasingly relevant threat in the modern day.
Lastly, we show that our end-to-end thermal-to-visible face verification system provides strong performance on the MILAB-VTF(B) dataset.
We show the efficacy of PASS to reduce gender and skintone information in descriptors from SOTA face recognition networks like Arcface.
Boosting is a method for finding a highly accurate hypothesis by linearly combining many ``weak" hypotheses, each of which may be only moderately accurate.
Our work builds on hierarchical video prediction models, which disentangle the video generation process into two stages: predicting a high-level representation, such as pose sequence, and then learning a pose-to-pixels translation model for pixel generation.
In contrast, the Hidden Trigger Backdoor Attack achieves poisoning without placing a trigger into the training data at all.
Urban material recognition in remote sensing imagery is a highly relevant, yet extremely challenging problem due to the difficulty of obtaining human annotations, especially on low resolution satellite images.
We tackle object category discovery, which is the problem of discovering and localizing novel objects in a large unlabeled dataset.
Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio.
A radiograph visualizes the internal anatomy of a patient through the use of X-ray, which projects 3D information onto a 2D plane.
Our model also outperforms the baseline on Mimetics, a dataset with out-of-context videos by 1. 14% while using only pose heatmaps.
To remedy this issue, robust formulations of OT with unbalanced marginal constraints have previously been proposed.
To tackle this issue, we take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks.
Therefore, we present a novel `Adversarial Gender De-biasing algorithm (AGENDA)' to reduce the gender information present in face descriptors obtained from previously trained face recognition networks.
Track 3 addressed city-scale multi-target multi-camera vehicle tracking.
In recent years, the research community has approached the problem of vehicle re-identification (re-id) with attention-based models, specifically focusing on regions of a vehicle containing discriminative information.
The proposed method consists of a layout module which primes a visual module to predict the type of interaction between a human and an object.
Recognizing Families In the Wild (RFIW): an annual large-scale, multi-track automatic kinship recognition evaluation that supports various visual kin-based problems on scales much higher than ever before.
We propose a new algorithm to incorporate class conditional information into the critic of GANs via a multi-class generalization of the commonly used Hinge loss that is compatible with both supervised and semi-supervised settings.
Ranked #3 on Conditional Image Generation on CIFAR-100
Additionally, we show how InvGAN can be used to implement reparameterization white-box attacks on projection-based defense mechanisms.
In the final fully connected layer of the networks, we found the order of expressivity for facial attributes to be Age > Sex > Yaw.
To mitigate the degradation due to turbulence which includes deformation and blur, we propose a generative single frame restoration algorithm which disentangles the blur and deformation due to turbulence and reconstructs a restored image.
Prior approaches utilize adversarial training based on cross entropy between the source and target domain distributions to learn a shared feature mapping that minimizes the domain gap.
We propose a marginal super-resolution (MSR) approach based on 2D convolutional neural networks (CNNs) for interpolating an anisotropic brain magnetic resonance scan along the highly under-sampled direction, which is assumed to axial without loss of generality.
The linkage between the sigogram and image domains is a novel Radon inversion layer that allows the gradients to back-propagate from the image domain to the sinogram domain during training.
Recent developments in machine learning and signal processing have resulted in many new techniques that are able to effectively capture the intrinsic yet complex properties of hyperspectral imagery.
In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER).
In this paper, we propose the Uncertainty-Gated Graph (UGG), which conducts graph-based identity propagation between tracklets, which are represented by nodes in a graph.
We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner.
Given a set of 3D point correspondences, we build a deep neural network to address the following two challenges: (i) classification of the point correspondences into inliers/outliers, and (ii) regression of the motion parameters that align the scans into a common reference frame.
Using the proposed normalized Wasserstein measure leads to significant performance gains for mixture distributions with imbalanced mixture proportions compared to the vanilla Wasserstein distance.
In this work, we consider challenging scenarios for unconstrained video-based face recognition from multiple-shot videos and surveillance videos with low-quality frames.
Training models that generalize to new domains at test time is a problem of fundamental importance in machine learning.
Ranked #31 on Domain Generalization on PACS
The deep regionlets framework consists of a region selection network and a deep regionlet learning module.
While upcoming algorithms continue to achieve improved performance, a majority of the face recognition systems are susceptible to failure under disguise variations, one of the most challenging covariate of face recognition.
In this paper, we present a modular system for spatio-temporal action detection in untrimmed security videos.
Incremental learning (IL) is an important task aimed at increasing the capability of a trained model, in terms of the number of classes recognizable by the model.
Building on the success of deep learning, two modern approaches to learn a probability model from the data are Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs).
We provide evaluation results of the proposed face detector on challenging unconstrained face detection datasets.
In this paper, we comprehensively study two covariate related problems for unconstrained face verification: first, how covariates affect the performance of deep neural networks on the large-scale unconstrained face verification problem; second, how to utilize covariates to improve verification performance.
Interestingly, we observe that after dropping 30% of the annotations (and labeling them as background), the performance of CNN-based object detectors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset.
These intermediate domains form a smooth path and bridge the gap between the source and target domains.
We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training.
We show that integrating this simple step in the training pipeline significantly improves the performance of face verification and recognition systems.
In this paper, we introduce the Face Magnifier Network (Face-MageNet), a face detector based on the Faster-RCNN framework which enables the flow of discriminative information of small scale faces to the classifier without any skip or residual connections.
Texture is a fundamental characteristic of many types of images, and texture representation is one of the essential and challenging problems in computer vision and pattern recognition which has attracted extensive research attention.
We achieve this by fusing two generators: one for unconditional image generation, and the other for conditional image generation, where the two partly share a common latent space thereby disentangling the generation.
Taking several facial segments and the full face as input, the proposed method takes a data driven approach to determine which attributes are localized in which facial segments.
In particular, we show that learning features in a closed and bounded space improves the robustness of the network.
In this work, we focus on adapting the representations learned by segmentation networks across synthetic and real domains.
To address these limitations, we propose an Expression Generative Adversarial Network (ExprGAN) for photo-realistic facial expression editing with controllable expression intensity.
First, a weakly-supervised face region localization network is designed to automatically detect regions (or parts) specific to attributes.
Recognition of low resolution face images is a challenging problem in many practical face recognition systems.
The learned metrics can improve multimodal classification accuracy and experimental results on four datasets show that the proposed algorithm outperforms existing learning algorithms based on multiple metrics as well as other approaches tested on these datasets.
In this paper, we present an efficient approach to perform adversarial training by perturbing intermediate layer activations and study the use of such perturbations as a regularizer during training.
While the research community appears to have developed a consensus on the methods of acquiring annotated data, design and training of CNNs, many questions still remain to be answered.
To this end, we propose Soft-NMS, an algorithm which decays the detection scores of all other objects as a continuous function of their overlap with M. Hence, no object is eliminated in this process.
The three detectors following this approach, namely Facial Segment-based Face Detector (FSFD), SegFace and DeepSegFace, discussed in this paper, perform binary classification on each proposal based on features learned from facial segments.
Domain Adaptation is an actively researched problem in Computer Vision.
Ranked #15 on Domain Adaptation on Office-31
Different from existing approaches of modeling these relationships, we propose learnable transform functions which captures the relationships between keypoints at feature level.
In recent years, the performance of face verification systems has significantly improved using deep convolutional neural networks (DCNNs).
Ranked #4 on Face Verification on IJB-A
In this paper, we propose an unsupervised face clustering algorithm called "Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local structure of deep representations.
In this paper, we show that without using any 3D information, KEPLER outperforms state of the art methods for alignment on challenging datasets such as AFW and AFLW.
Ranked #8 on Head Pose Estimation on BIWI
Thus, in this work, we propose a deep heterogeneous feature fusion network to exploit the complementary information present in features generated by different deep convolutional neural networks (DCNNs) for template-based face recognition, where a template refers to a set of still face images or video frames from different sources which introduces more blur, pose, illumination and other variations than traditional face datasets.
To prevent the majority labels from dominating the result of MCar, we generalize MCar to a weighted MCar (WMCar) that handles label imbalance.
The third method works by first converting the continuous orientation estimation task into a set of discrete orientation estimation tasks and then converting the discrete orientation outputs back to the continuous orientation using a mean-shift algorithm.
One promising technique to handle the challenge of partial faces is to design face detectors based on facial segments.
Recent progress in face detection (including keypoint detection), and recognition is mainly being driven by (i) deeper convolutional neural network architectures, and (ii) larger datasets.
The proposed method employs a multi-task learning framework that regularizes the shared parameters of CNN and builds a synergy among different domains and tasks.
Ranked #9 on Face Verification on IJB-A
In this paper, automated user verification techniques for smartphones are investigated.
In this paper, we present FaceNet2ExpNet, a novel idea to train an expression recognition network based on static images.
Ranked #1 on Facial Expression Recognition on CK+
Then, using this representation, we model human actions as curves in this Lie group.
Ranked #4 on Skeleton Based Action Recognition on Gaming 3D (G3D)
Large-scale supervised classification algorithms, especially those based on deep convolutional neural networks (DCNNs), require vast amounts of training data to achieve state-of-the-art performance.
Over the last five years, methods based on Deep Convolutional Neural Networks (DCNNs) have shown impressive performance improvements for object detection and recognition problems.
We present a Deep Convolutional Neural Network (DCNN) architecture for the task of continuous authentication on mobile devices.
Attributes, or semantic features, have gained popularity in the past few years in domains ranging from activity recognition in video to face verification.
Despite significant progress made over the past twenty five years, unconstrained face verification remains a challenging problem.
Ranked #11 on Face Verification on IJB-A
In this paper, a part-based technique for real time detection of users' faces on mobile devices is proposed.
We present an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN).
Ranked #4 on Face Detection on Annotated Faces in the Wild
In this work, we present an unconstrained face verification algorithm and evaluate it on the recently released IJB-A dataset that aims to push the boundaries of face verification methods.
Recently, it was shown that embedding such manifolds into a Random Projection Spaces (RPS), rather than RKHS or tangent space, leads to higher classification and clustering performance.
In this paper, we present a brief history of developments in computer vision and artificial neural networks over the last forty years for the problem of image-based recognition.
Periodic inspections are necessary to keep railroad tracks in state of good repair and prevent train accidents.
In this paper, we present an algorithm for unconstrained face verification based on deep convolutional features and evaluate it on the newly released IARPA Janus Benchmark A (IJB-A) dataset.
Ranked #13 on Face Verification on IJB-A
Many existing recognition algorithms combine different modalities based on training accuracy but do not consider the possibility of noise at test time.
In real-world action recognition problems, low-level features cannot adequately characterize the rich spatial-temporal structures in action videos.
We experimentally demonstrate that the proposed MKL approach, which we refer to as MKL-RT, can be successfully used to select features for discriminative dimensionality reduction and cross-modal retrieval.
Recently introduced cost-effective depth sensors coupled with the real-time skeleton estimation algorithm of Shotton et al. have generated a renewed interest in skeleton-based human action recognition.
Ranked #5 on Skeleton Based Action Recognition on UT-Kinect
We provide two novel adaptive-rate compressive sensing (CS) strategies for sparse, time-varying signals using side information.
We apply the regression forest employing our node splitting to head pose estimation (Euclidean target space) and car direction estimation (circular target space) and demonstrate that the proposed method significantly outperforms state-of-the-art methods (38. 5% and 22. 5% error reduction respectively).
This approach has three advantages: first, the extracted sparse representation for a subject is consistent across domains and enables pose and illumination insensitive face recognition.
We unify the class distribution and appearance information into an objective function for learning a sparse dictionary of action attributes.
We propose a novel dictionary-based learning method for ambiguously labeled multiclass classification, where each training sample has multiple labels and only one of them is the correct label.
In this paper, we address the issue of kernelselection for the classification of features that lie on Riemannian manifolds using the kernel learning approach.
Domain adaptation addresses the problem where data instances of a source domain have different distributions from that of a target domain, which occurs frequently in many real life scenarios.
Compressive sensing (CS) is a new approach for the acquisition and recovery of sparse signals and images that enables sampling rates significantly below the classical Nyquist rate.
In this paper, we consider the 'Precis' problem of sampling K representative yet diverse data points from a large dataset.
We further demonstrate the effectiveness of the proposed algorithm in solving the affine SfM problem, non-rigid SfM and photometric stereo problems.