Initially, we define the difficulty of a training image using transfer learning from some competitive "teacher" network trained on the Imagenet database, showing improvement in learning speed and final performance for both small and competitive networks, using the CIFAR-10 and the CIFAR-100 datasets.
An extensive empirical evaluation with modern deep models shows our method's utility on multiple datasets, neural networks architectures and training schemes, both when training from scratch and when using pre-trained networks in transfer learning.
The downside of such powerful learners is the danger of overfitting the training set, leading to poor generalization, which is usually avoided by regularization and "early stopping" of the training.
In the domain of semi-supervised learning (SSL), the conventional approach involves training a learner with a limited amount of labeled data alongside a substantial volume of unlabeled data, both drawn from the same underlying distribution.
This introduces a problem when training in the presence of noisy labels, as the noisy examples cannot be distinguished from clean examples by the end of training.
Deep active learning aims to reduce the annotation cost for the training of deep models, which is notoriously data-hungry.
Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies.
Ranked #1 on Active Learning on CIFAR10 (10,000)
Empirically, we show how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently at earlier stages of learning.
Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
We also notably improve the results in the extreme cases of 1, 2 and 3 labels per class, and show that features learned by our model are more meaningful for separating the data.
In the small-data regime, where only a small sample of labeled images is available for training with no access to additional unlabeled data, our results surpass state-of-the-art GAN models trained on the same amount of data.
GLICO learns a mapping from the training examples to a latent space and a generator that generates images from vectors in the latent space.
Simultaneously, the same deep network is trained to solve an additional self-supervised task of predicting image rotations.
Ranked #6 on Image Clustering on Tiny-ImageNet
We further show that this pattern of results reflects the interplay between the way neural networks learn benchmark datasets.
We also prove that when the ideal difficulty score is fixed, the convergence rate is monotonically increasing with respect to the loss of the current hypothesis at each point.
This and additional results point to the conclusion that performance gains as reported in previous work may be an artifact of random sense assignment, which is equivalent to sub-sampling and multiple estimation of word vector representations.
In order to provide a better fit to the target data distribution when the dataset includes many different classes, we propose a variant of the basic GAN model, called Gaussian Mixture GAN (GM-GAN), where the probability distribution over the latent space is a mixture of Gaussians.
We provide theoretical investigation of curriculum learning in the context of stochastic gradient descent when optimizing the convex linear regression loss.
The reliable measurement of confidence in classifiers' predictions is very important for many applications and is, therefore, an important part of classifier design.
Studies in visual perceptual learning investigate the way human performance improves with practice, in the context of relatively simple (and therefore more manageable) visual tasks.
Deep learning has become the method of choice in many application domains of machine learning in recent years, especially for multi-class classification tasks.
Our method is based on the initial assignment of confidence values, which measure the affinity between a new test point and each known class.
Since most data analysis and statistical methods do not handle gracefully missing values, the first step in the analysis requires the imputation of missing values.
The reliable detection of speed of moving vehicles is considered key to traffic law enforcement in most countries, and is seen by many as an important tool to reduce the number of traffic accidents and fatalities.
When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model.
We define a formal framework for the representation and processing of incongruent events: starting from the notion of label hierarchy, we show how partial order on labels can be deduced from such hierarchies.