On a periodic basis, publicly traded companies are required to report fundamentals: financial data such as revenue, earnings, debt, etc., providing insight into the company’s financial health.
While earlier works cast these manipulations as undesirable gaming, recent works have adopted a more nuanced causal framing in which manipulations can improve outcomes of interest, and setting coherent mechanisms requires accounting for both predictive accuracy and improvement of the outcome.
As algorithmic risk assessment instruments (RAIs) are increasingly adopted to assist decision makers, their predictive performance and potential to promote inequity have come under scrutiny.
In this paper, we consider the source of Deep Reinforcement Learning (DRL)'s sample complexity, asking how much derives from the requirement of learning useful representations of environment states and how much is due to the sample complexity of learning a policy.
Domain adaptation addresses the common problem when the target distribution generating our test data drifts from the source (training) distribution.
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.