Tuning hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation.
The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains.
Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space.
As a generic tool, the improvement introduced by ASR-Norm is agnostic to the choice of ADA methods.
However, the quality of uncertainty estimation is highly dependent on the dropout probabilities.
Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability.
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.
To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
In this paper, we provide a framework with provable guarantees for selecting hyperparameters in a number of distinct models.
Despite its successful progress in classic point-to-point search, there are few studies regarding point-to-hyperplane search, which has strong practical capabilities of scaling up in many applications like active learning with SVMs.