Predictor-based algorithms have achieved remarkable performance in the Neural Architecture Search (NAS) tasks.
Differentiable Neural Architecture Search is one of the most popular Neural Architecture Search (NAS) methods for its search efficiency and simplicity, accomplished by jointly optimizing the model weight and architecture parameters in a weight-sharing supernet via gradient-based algorithms.
In this work, we propose a deep reinforcement learning based solution for order dispatching and we conduct large scale online A/B tests on DiDi's ride-dispatching platform to show that the proposed method achieves significant improvement on both total driver income and user experience related metrics.
In this paper we propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks.
no code implementations • 29 Mar 2021 • Sharada Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov, Blake Wulfe, Dipam Chakraborty, Gražvydas Šemetulskis, João Schapke, Jonas Kubilius, Jurgis Pašukonis, Linas Klimas, Matthew Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel, Xiaocheng Tang, Xinwei Chen, Christopher Hesse, Jacob Hilton, William Hebgen Guss, Sahika Genc, John Schulman, Karl Cobbe
We present the design of a centralized benchmark for Reinforcement Learning which can help measure Sample Efficiency and Generalization in Reinforcement Learning by doing end to end evaluation of the training and rollout phases of thousands of user submitted code bases in a scalable way.
We present a new practical framework based on deep reinforcement learning and decision-time planning for real-world vehicle repositioning on ride-hailing (a type of mobility-on-demand, MoD) platforms.
This paper proposes a novel differentiable architecture search method by formulating it into a distribution learning problem.
Order dispatching and driver repositioning (also known as fleet management) in the face of spatially and temporally varying supply and demand are central to a ride-sharing platform marketplace.
How to optimally dispatch orders to vehicles and how to trade off between immediate and future returns are fundamental questions for a typical ride-hailing platform.
We consider classification tasks in the regime of scarce labeled training data in high dimensional feature space, where specific expert knowledge is also available.
Recently several methods were proposed for sparse optimization which make careful use of second-order information [10, 28, 16, 3] to improve local convergence rates.
We propose a novel general algorithm LHAC that efficiently uses second-order information to train a class of large-scale l1-regularized problems.