REINFORCE

REINFORCE is a Monte Carlo variant of a policy gradient algorithm in reinforcement learning. The agent collects samples of an episode using its current policy, and uses it to update the policy parameter $\theta$. Since one full trajectory must be completed to construct a sample space, it is updated as an off-policy algorithm.

$$ \nabla_{\theta}J\left(\theta\right) = \mathbb{E}_{\pi}\left[G_{t}\nabla_{\theta}\ln\pi_{\theta}\left(A_{t}\mid{S_{t}}\right)\right]$$

Image Credit: Tingwu Wang

Latest Papers

PAPER DATE
Variance-Reduced Off-Policy Memory-Efficient Policy Search
Daoming LyuQi QiMohammad GhavamzadehHengshuai YaoTianbao YangBo Liu
2020-09-14
Improving Language Generation with Sentence Coherence Objective
Ruixiao SunJie YangMehrdad Yousefzadeh
2020-09-07
ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
Sheng-Chun KaoGeonhwa JeongTushar Krishna
2020-09-04
An operator view of policy gradient methods
Dibya GhoshMarlos C. MachadoNicolas Le Roux
2020-06-19
Model-based Adversarial Meta-Reinforcement Learning
| Zichuan LinGarrett ThomasGuangwen YangTengyu Ma
2020-06-16
The Importance of Prior Knowledge in Precise Multimodal Prediction
Sergio CasasCole GulinoSimon SuoRaquel Urtasun
2020-06-04
Angle-based Search Space Shrinking for Neural Architecture Search
| Yiming HuYuding LiangZichao GuoRuosi WanXiangyu ZhangYichen WeiQingyi GuJian Sun
2020-04-28
Attention Routing: track-assignment detailed routing using attention-based reinforcement learning
Haiguang LiaoQingyi DongXuliang DongWentai ZhangWangyang ZhangWeiyi QiElias FallonLevent Burak Kara
2020-04-20
Guided Dialog Policy Learning without Adversarial Learning in the Loop
Ziming LiSungjin LeeBaolin PengJinchao LiShahin ShayandehJianfeng Gao
2020-04-07
A Better Variant of Self-Critical Sequence Training
Ruotian Luo
2020-03-22
A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning
Nhan H. PhamLam M. NguyenDzung T. PhanPhuong Ha NguyenMarten van DijkQuoc Tran-Dinh
2020-03-01
Estimating Gradients for Discrete Random Variables by Sampling without Replacement
| Wouter KoolHerke van HoofMax Welling
2020-02-14
Black-Box Optimization with Local Generative Surrogates
Sergey ShirobokovVladislav BelavinMichael KaganAndrey UstyuzhaninAtılım Güneş Baydin
2020-02-11
Unsupervised Program Synthesis for Images using Tree-Structured LSTM
Chenghui ZhouChun-Liang LiBarnabas Poczos
2020-01-27
Learning to drive via Apprenticeship Learning and Deep Reinforcement Learning
Wenhui HuangFrancesco BraghinZhuo Wang
2020-01-12
CaptainGAN: Navigate Through Embedding Space For Better Text Generation
Anonymous
2020-01-01
Robust Federated Learning Through Representation Matching and Adaptive Hyper-parameters
Hesham Mostafa
2019-12-30
UNAS: Differentiable Architecture Search Meets Reinforcement Learning
Arash VahdatArun MallyaMing-Yu LiuJan Kautz
2019-12-16
Neural Predictor for Neural Architecture Search
| Wei WenHanxiao LiuHai LiYiran ChenGabriel BenderPieter-Jan Kindermans
2019-12-02
Scene Graph based Image Retrieval -- A case study on the CLEVR Dataset
Sahana RamnathAmrita SahaSoumen ChakrabartiMitesh M. Khapra
2019-11-03
Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder
Jialin WuRaymond J. Mooney
2019-10-31
All-Action Policy Gradient Methods: A Numerical Integration Approach
Benjamin PetitLoren Amdahl-CulletonYao LiuJimmy SmithPierre-Luc Bacon
2019-10-21
Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator
James A. PreissSébastien M. R. ArnoldChen-Yu WeiMarius Kloft
2019-10-02
Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture
Pawel LadoszEseoghene Ben-IwhiwhuJeffery DickYang HuNicholas KetzSoheil KolouriJeffrey L. KrichmarPraveen PillyAndrea Soltoggio
2019-09-21
ER-AE: Differentially-private Text Generation for Authorship Anonymization
Haohan BoSteven H. H. DingBenjamin C. M. FungFarkhund Iqbal
2019-07-20
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning
Zhihao FanZhongyu WeiSiyuan WangXuanjing Huang
2019-07-01
Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Guy LorberbomChris J. MaddisonNicolas HeessTamir HazanDaniel Tarlow
2019-06-14
Exploiting Uncertainty of Loss Landscape for Stochastic Optimization
| Vineeth S. BhaskaraSneha Desai
2019-05-30
Recurrent Existence Determination Through Policy Optimization
Baoxiang Wang
2019-05-29
Interpretable Neural Predictions with Differentiable Binary Variables
Jasmijn BastingsWilker AzizIvan Titov
2019-05-20
AM-LFS: AutoML for Loss Function Search
Chuming LiYuan XinChen LinMinghao GuoWei WuWanli OuyangJunjie Yan
2019-05-17
ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables
Mingzhang YinYuguang YueMingyuan Zhou
2019-05-04
Beyond Games: Bringing Exploration to Robots in Real-world
Deepak PathakDhiraj GandhiAbhinav Gupta
2019-05-01
Posterior-regularized REINFORCE for Instance Selection in Distant Supervision
| Qi ZhangSiliang TangXiang RenFei WuShiliang PuYueting Zhuang
2019-04-17
Power Allocation in Multi-User Cellular Networks: Deep Reinforcement Learning Approaches
Fan MengPeng ChenLenan WuJulian Cheng
2019-01-22
Top-K Off-Policy Correction for a REINFORCE Recommender System
Minmin ChenAlex BeutelPaul CovingtonSagar JainFrancois BellettiEd Chi
2018-12-06
Generation of Synthetic Electronic Medical Record Text
Jiaqi GuanRunzhe LiSheng YuXuegong Zhang
2018-12-06
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
| Han CaiLigeng ZhuSong Han
2018-12-02
Learning to Exploit Stability for 3D Scene Parsing
Yilun DuZhijian LiuHector BaseviAles LeonardisBill FreemanJosh TenenbaumJiajun Wu
2018-12-01
Translating Natural Language to SQL using Pointer-Generator Networks and How Decoding Order Matters
Denis LukovnikovNilesh ChakrabortyJens LehmannAsja Fischer
2018-11-13
Macquarie University at BioASQ 6b: Deep learning and deep reinforcement learning for query-based summarisation
Diego Moll{\'a}
2018-11-01
A Fourier View of REINFORCE
| Adeel Pervez
2018-08-12
ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks
| Mingzhang YinMingyuan Zhou
2018-07-30
Learning Globally Optimized Object Detector via Policy Gradient
Yongming RaoDahua LinJiwen LuJie Zhou
2018-06-01
Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow
| Tuan Anh LeAdam R. KosiorekN. SiddharthYee Whye TehFrank Wood
2018-05-26
ReGAN: RE[LAX|BAR|INFORCE] based Sequence Generation using GANs
Aparna BalagopalanSatya GortiMathieu RavautRaeid Saqur
2018-05-08
Adversarial Training for Community Question Answer Selection Based on Multi-scale Matching
Xiao YangMadian KhabsaMiaosen WangWei WangMadian KhabsaAhmed AwadallahDaniel KiferC. Lee Giles
2018-04-22
CoT: Cooperative Training for Generative Modeling of Discrete Data
| Sidi LuLantao YuSiyuan FengYaoming ZhuWeinan ZhangYong Yu
2018-04-11
Attention, Learn to Solve Routing Problems!
| Wouter KoolHerke van HoofMax Welling
2018-03-22
Action-dependent Control Variates for Policy Optimization via Stein Identity
Hao Liu*Yihao Feng*Yi MaoDengyong ZhouJian PengQiang Liu
2018-01-01
Adversarial Policy Gradient for Alternating Markov Games
Chao GaoMartin MuellerRyan Hayward
2018-01-01
LEARNING TO ORGANIZE KNOWLEDGE WITH N-GRAM MACHINES
Fan YangJiazhong NieWilliam W. CohenNi Lao
2018-01-01
Technical Report for E2E NLG Challenge
Heng Gong
2017-12-19
Differentiable lower bound for expected BLEU score
| Vlad ZhukovEugene GolikovMaksim Kretov
2017-12-13
Action-depedent Control Variates for Policy Optimization via Stein's Identity
| Hao LiuYihao FengYi MaoDengyong ZhouJian PengQiang Liu
2017-10-30
Energy-efficient Amortized Inference with Cascaded Deep Classifiers
Jiaqi GuanYang LiuQiang LiuJian Peng
2017-10-10
Rapid Probabilistic Interest Learning from Domain-Specific Pairwise Image Comparisons
Michael BurkeSiyabonga MbonambiPurity MolalaRaesetje Sefala
2017-06-19
Learning Hard Alignments with Variational Inference
Dieterich LawsonChung-Cheng ChiuGeorge TuckerColin RaffelKevin SwerskyNavdeep Jaitly
2017-05-16
Inferring and Executing Programs for Visual Reasoning
| Justin JohnsonBharath HariharanLaurens van der MaatenJudy HoffmanLi Fei-FeiC. Lawrence ZitnickRoss Girshick
2017-05-10
Stein Variational Policy Gradient
Yang LiuPrajit RamachandranQiang LiuJian Peng
2017-04-07
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
| George TuckerAndriy MnihChris J. MaddisonDieterich LawsonJascha Sohl-Dickstein
2017-03-21
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision (Short Version)
Chen LiangJonathan BerantQuoc LeKenneth D. ForbusNi Lao
2016-12-04
Self-critical Sequence Training for Image Captioning
| Steven J. RennieEtienne MarcheretYoussef MrouehJarret RossVaibhava Goel
2016-12-02
Threshold Learning for Optimal Decision Making
Nathan F. Lepora
2016-12-01
Improving Policy Gradient by Exploring Under-appreciated Rewards
Ofir NachumMohammad NorouziDale Schuurmans
2016-11-28
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
| Chen LiangJonathan BerantQuoc LeKenneth D. ForbusNi Lao
2016-10-31
Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks
Nicolas UsunierGabriel SynnaeveZeming LinSoumith Chintala
2016-09-10
End-to-end Learning of Action Detection from Frame Glimpses in Videos
Serena YeungOlga RussakovskyGreg MoriLi Fei-Fei
2015-11-22
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua BengioNicholas LéonardAaron Courville
2013-08-15
Analysis and Improvement of Policy Gradient Estimation
Tingting ZhaoHirotaka HachiyaGang NiuMasashi Sugiyama
2011-12-01

Tasks

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories