Tuning hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation.
Listwise ranking losses have been widely studied in recommender systems.
To improve the sample efficiency of policy-gradient based reinforcement learning algorithms, we propose implicit distributional actor-critic (IDAC) that consists of a distributional critic, built on two deep generator networks (DGNs), and a semi-implicit actor (SIA), powered by a flexible policy distribution.
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently.
In this work, we investigate semi-supervised learning (SSL) for image classification using adversarial training.
In this paper, we provide a framework with provable guarantees for selecting hyperparameters in a number of distinct models.
To address the challenge of backpropagating the gradient through categorical variables, we propose the augment-REINFORCE-swap-merge (ARSM) gradient estimator that is unbiased and has low variance.