To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model.
With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models.
We use the framework to optimize data synthesis and demonstrate significant improvement on handwriting recognition over a model trained on real data only.
In this paper, we tackle the training-inference mismatch encountered during unsupervised learning of controllable generative sequence models.
When applied to datasets where one or more tasks can have noisy annotations, the proposed method learns to prioritize learning from clean labels for a given task, e. g. reducing surface estimation errors by up to 60%.
Our policy adapts the augmentation parameters based on the training loss of the data samples.
The DNN, in prior methods, is trained independent of the HMM parameters to minimize the cross-entropy loss between the predicted and the ground-truth state probabilities.
Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100).
We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples.
We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i. e., no style annotation, such as speaker information, is required.
We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed least squares quantization algorithms.
To the best of our knowledge, our work is the first curriculum learning method to show gains on large scale image classification and detection tasks.
Many recent works on 3D object detection have focused on designing neural network architectures that can consume point cloud data.
In this work, we propose and evaluate the stochastic preconditioned nonlinear conjugate gradient algorithm for large scale DNN training tasks.
Improving the robustness of neural networks against these attacks is important, especially for security-critical applications.
Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality.
Ranked #1 on Object Localization on KITTI Cars Hard
With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations.
Ranked #3 on Image-to-Image Translation on Cityscapes Labels-to-Photo (Per-class Accuracy metric)
We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images.
In contrast to the existing approaches that use discrete Conditional Random Field (CRF) models, we propose to use a Gaussian CRF model for the task of semantic segmentation.
We present a multi-stream bi-directional recurrent neural network for fine-grained action detection.
In our deep network architecture the global and local constraints that define a face can be efficiently modeled and learned end-to-end using training data.
Face alignment is particularly challenging when there are large variations in pose (in-plane and out-of-plane rotations) and facial expression.
We propose a novel deep network architecture for image\\ denoising based on a Gaussian Conditional Random Field (GCRF) model.
We propose a layered street view model to encode both depth and semantic information on street view images for autonomous driving.
We propose to tackle this problem by including the classification loss of the internal nodes of the random parse trees in the original RCPN loss function.
1) For the edge layer, we use a nonparametric approach by constructing a dictionary of patches from a given image, and synthesize edge regions in a higher-resolution version of the image.