Second, we apply NCINet to identify the causal relations between image representations of different pairs of attributes with known and unknown causal relations between the labels.
Furthermore, while a plurality of 3D reconstructions is plausible in the occluded regions, existing approaches are limited to generating only a single solution.
Adversarial representation learning aims to learn data representations for a target task while removing unwanted sensitive information at the same time.
However, we hypothesize that such spatially rigid processing is suboptimal for simultaneously restoring the degraded pixels as well as reconstructing the clean regions of the image.
In this paper, we propose an efficient NAS algorithm for generating task-specific models that are competitive under multiple competing objectives.
Ranked #6 on Neural Architecture Search on ImageNet
At the same time, the architecture search and transfer is orders of magnitude more efficient than existing NAS methods.
Ranked #1 on Neural Architecture Search on FGVC Aircraft
To overcome this limitation, we present MUXConv, a layer that is designed to increase the flow of information by progressively multiplexing channel and spatial information in the network, while mitigating computational complexity.
Ranked #4 on Neural Architecture Search on CIFAR-100
We present a method to search for a probe (or query) image representation against a large gallery in the encrypted domain.
While existing approaches have achieved competitive performance in image classification, they are not well suited to problems where the computational budget is limited for two reasons: (1) the obtained architectures are either solely optimized for classification performance, or only for one deployment scenario; (2) the search process requires vast computational resources in most approaches.
Ranked #1 on Pneumonia Detection on ChestX-ray14
Numerical experiments on UCI, Extended Yale B and CIFAR-100 datasets indicate that, (a) practically, our solution is ideal for "imparting" provable invariance to any biased pre-trained data representation, and (b) empirically, the trade-off between utility and invariance provided by our solution is comparable to iterative minimax optimization of existing deep neural network based approaches.
Image recognition systems have demonstrated tremendous progress over the past few decades thanks, in part, to our ability of learning compact and robust representations of images.
In this paper, we explore the practicality of using a fully homomorphic encryption based framework to secure a database of face templates.
This paper addresses the following questions pertaining to the intrinsic dimensionality of any given image representation: (i) estimate its intrinsic dimensionality, (ii) develop a deep neural network based non-linear mapping, dubbed DeepMDS, that transforms the ambient representation to the minimal intrinsic space, and (iii) validate the veracity of the mapping through image matching in the intrinsic space.
To efficiently search for optimal groupings conditioned on the input data, we propose a reinforcement learning search strategy using recurrent networks to learn the optimal group assignments for each network layer.
Numerical experiments on unconstrained faces (IJB-C) provides a capacity upper bound of $2. 7\times10^4$ for FaceNet and $8. 4\times10^4$ for SphereFace representation at a false acceptance rate (FAR) of 1%.
To account for partial occlusions we introduce, Robust Constrained Local Models, that comprises of a deformable shape and local landmark appearance model and reasons over binary occlusion labels.
A recent advance called the WGAN based on Wasserstein distance can improve on the KL and JS-divergence based GANs, and alleviate the gradient vanishing, instability, and mode collapse issues that are common in the GAN training.
We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data.
We introduce the concept of a Visual Compiler that generates a scene specific pedestrian detector and pose estimator without any pedestrian observations.
Concretely, our approach uses two convolutional neural networks: (1) a gesture network that uses pre-defined motion information to detect the hand region; and (2) an appearance network that learns a person specific model of the hand region based on the output of the gesture network.
Our results also yield a surprising result, that our method using purely synthetic data is able to outperform models trained on real scene-specific data when data is limited.
However, existing CF designs do not account for the fact that the multiplication of two DFTs in the frequency domain corresponds to a circular correlation in the time/spatial domain.
A typical object alignment system consists of a landmark appearance model which is used to obtain an initial shape and a shape model which refines this initial shape by correcting the initialization errors.