Search Results for author: Pingzhong Tang

Found 14 papers, 2 papers with code

SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition

no code implementations17 Nov 2021 Hangyu Mao, Chao Wang, Xiaotian Hao, Yihuan Mao, Yiming Lu, Chengjie WU, Jianye Hao, Dong Li, Pingzhong Tang

The MineRL competition is designed for the development of reinforcement learning and imitation learning algorithms that can efficiently leverage human demonstrations to drastically reduce the number of environment interactions needed to solve the complex \emph{ObtainDiamond} task with sparse rewards.

Imitation Learning reinforcement-learning

Safe Opponent-Exploitation Subgame Refinement

no code implementations29 Sep 2021 Mingyang Liu, Chengjie WU, Qihan Liu, Yansen Jing, Jun Yang, Pingzhong Tang, Chongjie Zhang

Search algorithms have been playing a vital role in the success of superhuman AI in both perfect information and imperfect information games.

DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis

1 code implementation3 Oct 2020 Chuheng Zhang, Yuanqi Li, Xi Chen, Yifei Jin, Pingzhong Tang, Jian Li

Modern machine learning models (such as deep neural networks and boosting decision tree models) have become increasingly popular in financial market prediction, due to their superior capacity to extract complex non-linear patterns.

Deterministic Value-Policy Gradients

no code implementations9 Sep 2019 Qingpeng Cai, Ling Pan, Pingzhong Tang

Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias.

Continuous Control reinforcement-learning

Field-aware Calibration: A Simple and Empirically Strong Method for Reliable Probabilistic Predictions

no code implementations26 May 2019 Feiyang Pan, Xiang Ao, Pingzhong Tang, Min Lu, Dapeng Liu, Lei Xiao, Qing He

It is often observed that the probabilistic predictions given by a machine learning model can disagree with averaged actual outcomes on specific subsets of data, which is also known as the issue of miscalibration.

Click-Through Rate Prediction

Incremental training of multi-generative adversarial networks

no code implementations ICLR 2019 Qi Tan, Pingzhong Tang, Ke Xu, Weiran Shen, Song Zuo

Generative neural networks map a standard, possibly distribution to a complex high-dimensional distribution, which represents the real world data set.

Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings

1 code implementation25 Apr 2019 Feiyang Pan, Shuokai Li, Xiang Ao, Pingzhong Tang, Qing He

We propose Meta-Embedding, a meta-learning-based approach that learns to generate desirable initial embeddings for new ad IDs.

Click-Through Rate Prediction Meta-Learning

Policy Optimization with Model-based Explorations

no code implementations18 Nov 2018 Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hua-Lin He, Qing He, Pingzhong Tang

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games.

Atari Games Decision Making +2

Deterministic Policy Gradients With General State Transitions

no code implementations10 Jul 2018 Qingpeng Cai, Ling Pan, Pingzhong Tang

Such a setting generalizes the widely-studied stochastic state transition setting, namely the setting of deterministic policy gradient (DPG).

Continuous Control

Automated Mechanism Design via Neural Networks

no code implementations9 May 2018 Weiran Shen, Pingzhong Tang, Song Zuo

We then apply our framework to a number of multi-item revenue optimal design settings, for a few of which the theoretically optimal mechanisms are unknown.

A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems

no code implementations13 Feb 2018 Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang

Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.


Policy Gradients for Contextual Recommendations

no code implementations12 Feb 2018 Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, Qing He

We evaluate PGCR on toy datasets as well as a real-world dataset of personalized music recommendations.

Decision Making Multi-Armed Bandits +2

Reinforcement Mechanism Design for e-commerce

no code implementations25 Aug 2017 Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, Yiwei Zhang

We study the problem of allocating impressions to sellers in e-commerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform.

Optimal Vehicle Dispatching Schemes via Dynamic Pricing

no code implementations6 Jul 2017 Mengjing Chen, Weiran Shen, Pingzhong Tang, Song Zuo

To this end, we use a so-called "ironing" technique to convert the problem into an equivalent convex optimization one via a clean Markov decision process (MDP) formulation, where the states are the driver distributions and the decision variables are the prices for each pair of locations.

Cannot find the paper you are looking for? You can Submit a new open access paper.