Entropy Regularization

Introduced by Mnih et al. in Asynchronous Methods for Deep Reinforcement Learning

Entropy Regularization is a type of regularization used in reinforcement learning. For on-policy policy gradient based methods like A3C, the same mutual reinforcement behaviour leads to a highly-peaked $\pi\left(a\mid{s}\right)$ towards a few actions or action sequences, since it is easier for the actor and critic to overoptimise to a small portion of the environment. To reduce this problem, entropy regularization adds an entropy term to the loss to promote action diversity:

$$H(X) = -\sum\pi\left(x\right)\log\left(\pi\left(x\right)\right) $$

Image Credit: Wikipedia

Source: Asynchronous Methods for Deep Reinforcement Learning

Latest Papers

PAPER DATE
Proximal Policy Gradient: PPO with Policy Gradient
Ju-Seung ByunByungmoon KimHuamin Wang
2020-10-20
Recurrent Distributed Reinforcement Learning for Partially Observable Robotic Assembly
Jieliang LuoHui Li
2020-10-15
Discrete Latent Space World Models for Reinforcement Learning
Jan RobineTobias UelwerStefan Harmeling
2020-10-12
Automated Concatenation of Embeddings for Structured Prediction
Xinyu WangYong JiangNguyen BachTao WangZhongqiang HuangFei HuangKewei Tu
2020-10-10
No MCMC for me: Amortized sampling for fast and stable training of energy-based models
Will GrathwohlJacob KellyMilad HashemiMohammad NorouziKevin SwerskyDavid Duvenaud
2020-10-08
Proximal Policy Optimization with Relative Pearson Divergence
Taisuke Kobayashi
2020-10-07
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation
| Minki KangMoonsu HanSung Ju Hwang
2020-10-06
Entropy Regularization for Mean Field Games with Learning
Xin GuoRenyuan XuThaleia Zariphopoulou
2020-09-30
Revisiting Design Choices in Proximal Policy Optimization
| Chloe Ching-Yun HsuCelestine Mendler-DünnerMoritz Hardt
2020-09-23
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
Doyup LeeYeongjae CheonWook-Shin Han
2020-09-21
Phasic Policy Gradient
| Karl CobbeJacob HiltonOleg KlimovJohn Schulman
2020-09-09
Data-Driven Transferred Energy Management Strategy for Hybrid Electric Vehicles via Deep Reinforcement Learning
Jiangdong LiaoTeng LiuWenhao TanShaobo LuYalian Yang
2020-09-07
Dynamic Scheduling for Stochastic Edge-Cloud Computing Environments using A3C learning and Residual Recurrent Neural Networks
Shreshth TuliShashikant IlagerKotagiri RamamohanaraoRajkumar Buyya
2020-09-01
On the model-based stochastic value gradient for continuous reinforcement learning
Brandon AmosSamuel StantonDenis YaratsAndrew Gordon Wilson
2020-08-28
Cross-regional oil palm tree counting and detection via multi-level attention domain adaptation network
| Juepeng ZhengHaohuan FuWeijia LiWenzhao WuYi ZhaoRunmin DongLe Yu
2020-08-26
Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning
Wenshuai ZhaoJorge Peña QueraltaLi QingqingTomi Westerlund
2020-08-18
Queueing Network Controls via Deep Reinforcement Learning
J. G. DaiMark Gluzman
2020-07-31
Lagrangian Duality in Reinforcement Learning
Pranay Pasula
2020-07-20
Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
Shicong CenChen ChengYuxin ChenYuting WeiYuejie Chi
2020-07-13
Maximum Entropy Regularization and Chinese Text Recognition
Changxu ChengWuheng XuXiang BaiBin FengWenyu Liu
2020-07-09
Learning Implicit Credit Assignment for Multi-Agent Actor-Critic
Meng ZhouZiyu LiuPengwei SuiYixuan LiYuk Ying Chung
2020-07-06
An operator view of policy gradient methods
Dibya GhoshMarlos C. MachadoNicolas Le Roux
2020-06-19
Fine-Tuning DARTS for Image Classification
Muhammad Suhaib TanveerMuhammad Umar Karim KhanChong-Min Kyung
2020-06-16
Optimistic Distributionally Robust Policy Optimization
Jun SongChaoyue Zhao
2020-06-14
Exploration by Maximizing Rényi Entropy for Zero-Shot Meta RL
Chuheng ZhangYuanying CaiLongbo HuangJian Li
2020-06-11
Rethinking Pre-training and Self-training
| Barret ZophGolnaz GhiasiTsung-Yi LinYin CuiHanxiao LiuEkin D. CubukQuoc V. Le
2020-06-11
A Comparison of Self-Play Algorithms Under a Generalized Framework
Daniel HernandezKevin DenamganaiSam DevlinSpyridon SamothrakisJames Alfred Walker
2020-06-08
Optimization and passive flow control using single-step deep reinforcement learning
H. GhraiebJ. ViqueratA. LarcherP. MeligaE. Hachem
2020-06-04
Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration
Seungyul HanYoungchul Sung
2020-06-02
Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement Learning
Jaskirat SinghLiang Zheng
2020-05-25
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
Logan EngstromAndrew IlyasShibani SanturkarDimitris TsiprasFirdaus JanoosLarry RudolphAleksander Madry
2020-05-25
Mirror Descent Policy Optimization
Manan TomarLior ShaniYonathan EfroniMohammad Ghavamzadeh
2020-05-20
On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng MeiChenjun XiaoCsaba SzepesvariDale Schuurmans
2020-05-13
Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
| Antonin RaffinFreek Stulp
2020-05-12
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing
Clara MeisterElizabeth SaleskyRyan Cotterell
2020-05-02
Model-based reinforcement learning for biological sequence design
Christof AngermuellerDavid DohanDavid BelangerRamya DeshpandeKevin MurphyLucy Colwell
2020-05-01
Look at the First Sentence: Position Bias in Question Answering
Miyoung KoJinhyuk LeeHyunjae KimGangwoo KimJaewoo Kang
2020-04-30
Robust active flow control over a range of Reynolds numbers using an artificial neural network trained through deep reinforcement learning
Hongwei TangJean RabaultAlexander KuhnleYan WangTongguang Wang
2020-04-26
Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Shangtong ZhangBo LiuShimon Whiteson
2020-04-22
Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms
| Federico A. GalatoloMario G. C. A. CiminoGigliola Vaglini
2020-04-08
Guided Dialog Policy Learning without Adversarial Learning in the Loop
Ziming LiSungjin LeeBaolin PengJinchao LiShahin ShayandehJianfeng Gao
2020-04-07
Evolving Normalization-Activation Layers
| Hanxiao LiuAndrew BrockKaren SimonyanQuoc V. Le
2020-04-06
Leverage the Average: an Analysis of Regularization in RL
Nino VieillardTadashi KozunoBruno ScherrerOlivier PietquinRémi MunosMatthieu Geist
2020-03-31
MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
| Yuan GaoHaoping BaiZequn JieJiayi MaKui JiaWei Liu
2020-03-31
Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with Reward Shaping
Daniel ZhangColleen P. Bailey
2020-03-28
Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations
| Huan ZhangHongge ChenChaowei XiaoBo LiMingyan LiuDuane BoningCho-Jui Hsieh
2020-03-19
Adaptive Discretization for Continuous Control using Particle Filtering Policy Network
| Pei XuIoannis Karamouzas
2020-03-16
Explore and Exploit with Heterotic Line Bundle Models
Magdalena LarforsRobin Schneider
2020-03-10
Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors
| Rituraj KaushikTimothée AnneJean-Baptiste Mouret
2020-03-10
Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks
Xingyu ShaJiaqi ZhangKaiqing ZhangKeyou YouTamer Başar
2020-03-01
A Self-Tuning Actor-Critic Algorithm
Tom ZahavyZhongwen XuVivek VeeriahMatteo HesselJunhyuk OhHado van HasseltDavid SilverSatinder Singh
2020-02-28
A Visual Communication Map for Multi-Agent Deep Reinforcement Learning
Ngoc Duy NguyenThanh Thi NguyenSaeid Nahavandi
2020-02-27
Generalized Product Quantization Network for Semi-supervised Image Retrieval
| Young Kyun JangNam Ik Cho
2020-02-26
Reinforcement Learning Framework for Deep Brain Stimulation Study
| Dmitrii KrylovRemi TachetRomain LarocheMichael RosenblumDmitry V. Dylov
2020-02-22
Deep RL Agent for a Real-Time Action Strategy Game
| Michal WarchalskiDimitrije RadojevicMilos Milosevic
2020-02-15
Temporal-adaptive Hierarchical Reinforcement Learning
Wen-Ji ZhouYang Yu
2020-02-06
Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation
Siqi YangLin WuArnold WiliemBrian C. Lovell
2020-02-03
Brain Metastasis Segmentation Network Trained with Robustness to Annotations with Multiple False Negatives
Darvin YiEndre GrøvikMichael IvElizabeth TongGreg ZaharchukDaniel Rubin
2020-01-26
Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO
Mario S. HolubarMarco A. Wiering
2020-01-15
Intelligent Roundabout Insertion using Deep Reinforcement Learning
| Alessandro Paolo CapassoGiulio BacchianiDaniele Molinari
2020-01-03
Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning
Anonymous
2020-01-01
Learning Representations in Reinforcement Learning: an Information Bottleneck Approach
Yingjun PeiXinwen Hou
2020-01-01
TPO: TREE SEARCH POLICY OPTIMIZATION FOR CONTINUOUS ACTION SPACES
Amir YazdanbakhshEbrahim SonghoriRobert OrmandiAnna GoldieAzalia Mirhoseini
2020-01-01
Model-based reinforcement learning for biological sequence design
Anonymous
2020-01-01
Improving Exploration of Deep Reinforcement Learning using Planning for Policy Search
Anonymous
2020-01-01
Implementation Matters in Deep RL: A Case Study on PPO and TRPO
| Anonymous
2020-01-01
SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning
| Keng Wah LoonLaura GraesserMilan Cvitkovic
2019-12-28
Soft Q-network
Jingbin LiuXinyang GuShuai LiuDexiang Zhang
2019-12-20
Mastering Complex Control in MOBA Games with Deep Reinforcement Learning
Deheng YeZhao LiuMingfei SunBei ShiPeilin ZhaoHao WuHongsheng YuShaojie YangXipeng WuQingwei GuoQiaobo ChenYinyuting YinHao ZhangTengfei ShiLiang WangQiang FuWei YangLanxiao Huang
2019-12-20
Marginalized State Distribution Entropy Regularization in Policy Optimization
Riashat IslamZafarali AhmedDoina Precup
2019-12-11
Entropy Regularization with Discounted Future State Distribution in Policy Gradient Methods
Riashat IslamRaihan SerajPierre-Luc BaconDoina Precup
2019-12-11
Intelligent Coordination among Multiple Traffic Intersections Using Multi-Agent Reinforcement Learning
Ujwal Padam TewariVishal BidawatkaVarsha RaveendranVinay SudhakaranShreedhar Kodate ShreeshailJayanth Prakash Kulkarni
2019-12-09
On-policy Reinforcement Learning with Entropy Regularization
Jingbin LiuXinyang GuDexiang ZhangShuai Liu
2019-12-02
Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy
Boyi LiuQi CaiZhuoran YangZhaoran Wang
2019-12-01
Automated curriculum generation for Policy Gradients from Demonstrations
Anirudh SrinivasanDzmitry BahdanauMaxime Chevalier-BoisvertYoshua Bengio
2019-12-01
Adversary A3C for Robust Reinforcement Learning
Zhaoyuan GuZhenzhong JiaHowie Choset
2019-12-01
Learning Reward Machines for Partially Observable Reinforcement Learning
Rodrigo Toro IcarteEthan WaldieToryn KlassenRick ValenzanoMargarita CastroSheila Mcilraith
2019-12-01
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Michael LuoJiahao YaoRichard LiawEric LiangIon Stoica
2019-11-30
Accelerating Training in Pommerman with Imitation and Reinforcement Learning
Hardik MeisheriOmkar ShelkeRicha VermaHarshad Khadilkar
2019-11-12
Learning Representations in Reinforcement Learning:An Information Bottleneck Approach
Pei YingjunHou Xinwen
2019-11-12
Situated GAIL: Multitask imitation using task-conditioned adversarial inverse reinforcement learning
Kyoichiro KobayashiTakato HoriiRyo IwakiYukie NagaiMinoru Asada
2019-11-01
HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators
| Chengshu LiFei XiaRoberto Martin-MartinSilvio Savarese
2019-10-24
Regularization Matters in Policy Optimization
| Zhuang LiuXuanlin LiBingyi KangTrevor Darrell
2019-10-21
Prescribed Generative Adversarial Networks
| Adji B. DiengFrancisco J. R. RuizDavid M. BleiMichalis K. Titsias
2019-10-09
TorchBeast: A PyTorch Platform for Distributed RL
| Heinrich KüttlerNantas NardelliThibaut LavrilMarco SelvaticiViswanath SivakumarTim RocktäschelEdward Grefenstette
2019-10-08
Randomized Shortest Paths with Net Flows and Capacity Constraints
Sylvain CourtainPierre LeleuxIlkka KivimakiGuillaume GuexMarco Saerens
2019-10-04
Quantized Reinforcement Learning (QUARL)
| Srivatsan KrishnanSharad ChitlangiaMaximilian LamZishen WanAleksandra FaustVijay Janapa Reddi
2019-10-02
Forward-Backward Splitting for Optimal Transport based Problems
Guillermo Ortiz-JimenezMireille El GhecheEffrosyni SimouHermina Petric MareticPascal Frossard
2019-09-20
Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning
Felix LeibfriedJordi Grau-Moya
2019-09-11
VUSFA:Variational Universal Successor Features Approximator to Improve Transfer DRL for Target Driven Visual Navigation
| Shamane SiriwardhanaRivindu WeerasakeraDenys J. C. MatthiesSuranga Nanayakkara
2019-08-18
Incremental Reinforcement Learning --- a New Continuous Reinforcement Learning Frame Based on Stochastic Differential Equation methods
Tianhao ChenLimei ChengYang LiuWenchuan JiaShugen Ma
2019-08-08
DoorGym: A Scalable Door Opening Environment And Baseline Agent
| Yusuke UrakamiAlec HodgkinsonCasey CarlinRandall LeuLuca RigazioPieter Abbeel
2019-08-05
Towards Model-based Reinforcement Learning for Industry-near Environments
Per-Arne AndersenMorten GoodwinOle-Christoffer Granmo
2019-07-27
Unsupervised Domain Adaptation via Calibrating Uncertainties
| Ligong HanYang ZouRuijiang GaoLezi WangDimitris Metaxas
2019-07-25
Google Research Football: A Novel Reinforcement Learning Environment
| Karol KurachAnton RaichukPiotr StańczykMichał ZającOlivier BachemLasse EspeholtCarlos RiquelmeDamien VincentMarcin MichalskiOlivier BousquetSylvain Gelly
2019-07-25
Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning
Bilal KartalPablo Hernandez-LealMatthew E. Taylor
2019-07-24
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
Pablo Hernandez-LealBilal KartalMatthew E. Taylor
2019-07-22
PPO Dash: Improving Generalization in Deep Reinforcement Learning
Joe Booth
2019-07-15
Modified Actor-Critics
Erinc MerdivanSten HankeMatthieu Geist
2019-07-02
End-to-end Deep Reinforcement Learning Based Coreference Resolution
Hongliang FeiXu LiDingcheng LiPing Li
2019-07-01
Learning Data Augmentation Strategies for Object Detection
| Barret ZophEkin D. CubukGolnaz GhiasiTsung-Yi LinJonathon ShlensQuoc V. Le
2019-06-26
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Boyi LiuQi CaiZhuoran YangZhaoran Wang
2019-06-25
Proximal Distilled Evolutionary Reinforcement Learning
Cristian BodnarBen DayPietro Lió
2019-06-24
RL-Based Method for Benchmarking the Adversarial Resilience and Robustness of Deep Reinforcement Learning Policies
Vahid BehzadanWilliam Hsu
2019-06-03
Policy Search by Target Distribution Learning for Continuous Control
| Chuheng ZhangYuanqi LiJian Li
2019-05-27
Combine PPO with NES to Improve Exploration
Lianjiang LiYunrong YangBingna Li
2019-05-23
Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment
Jivitesh SharmaPer-Arne AndersenOle-Chrisoffer GranmoMorten Goodwin
2019-05-23
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
| Seungyul HanYoungchul Sung
2019-05-07
Autonomous Air Traffic Controller: A Deep Multi-Agent Reinforcement Learning Approach
Marc BrittainPeng Wei
2019-05-02
Soft Q-Learning with Mutual-Information Regularization
Jordi Grau-MoyaFelix LeibfriedPeter Vrancx
2019-05-01
SUPERVISED POLICY UPDATE
| Quan VuongYiming ZhangKeith W. Ross
2019-05-01
Towards Combining On-Off-Policy Methods for Real-World Applications
Kai-Chun HuChen-Huan PiTing Han WeiI-Chen WuStone ChengYi-Wei DaiWei-Yuan Ye
2019-04-24
Rogue-Gym: A New Challenge for Generalization in Reinforcement Learning
| Yuji KanagawaTomoyuki Kaneko
2019-04-17
Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning
Gabriel V. de la Cruz Jr.Yunshu DuMatthew E. Taylor
2019-04-03
Truly Proximal Policy Optimization
| Yuhui WangHao HeChao WenXiaoyang Tan
2019-03-19
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Denis SteckelmacherHélène PlisnierDiederik M. RoijersAnn Nowé
2019-03-11
Trust Region-Guided Proximal Policy Optimization
| Yuhui WangHao HeXiaoyang TanYaozhong Gan
2019-01-29
Combinational Q-Learning for Dou Di Zhu
| Yang YouLiangwei LiBaisong GuoWeiming WangCewu Lu
2019-01-24
Distillation Strategies for Proximal Policy Optimization
Sam GreenCraig M. VineyardÇetin Kaya Koç
2019-01-23
On-Policy Trust Region Policy Optimisation with Replay Buffers
| Dmitry KanginNicolas Pugeault
2019-01-18
A Logarithmic Barrier Method For Proximal Policy Optimization
Cheng ZengHongming Zhang
2018-12-16
Exploration versus exploitation in reinforcement learning: a stochastic control approach
Haoran WangThaleia ZariphopoulouXunyu Zhou
2018-12-04
Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL
Bilal KartalPablo Hernandez-LealMatthew E. Taylor
2018-11-30
Single-Agent Policy Tree Search With Guarantees
| Laurent OrseauLevi H. S. LelisTor LattimoreThéophane Weber
2018-11-27
Universal Semi-Supervised Semantic Segmentation
| Tarun KalluriGirish VarmaManmohan ChandrakerC V Jawahar
2018-11-26
Policy Optimization with Model-based Explorations
Feiyang PanQingpeng CaiAn-Xiang ZengChun-Xiang PanQing DaHualin HeQing HePingzhong Tang
2018-11-18
On the Complexity of Exploration in Goal-Driven Navigation
| Maruan Al-ShedivatLisa LeeRuslan SalakhutdinovEric Xing
2018-11-16
Equivalent Constraints for Two-View Geometry: Pose Solution/Pure Rotation Identification and 3D Reconstruction
Qi CaiYuanxin WuLilian ZhangPeike Zhang
2018-10-13
Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs
| Yogesh BalajiHamed HassaniRama ChellappaSoheil Feizi
2018-10-09
NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm
| Zhichao LuIan WhalenVishnu BoddetiYashesh DhebarKalyanmoy DebErik GoodmanWolfgang Banzhaf
2018-10-08
PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation
| Perttu HämäläinenAmin BabadiXiaoxiao MaJaakko Lehtinen
2018-10-05
Reinforcement Learning with Perturbed Rewards
Jingkang WangYang LiuBo Li
2018-10-02
A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters
Lei YangJia LiDefeng SunKim-Chuan Toh
2018-09-12
Adversarial Deep Reinforcement Learning in Portfolio Management
| Zhipeng LiangHao ChenJunhao ZhuKangkang JiangYanran Li
2018-08-29
Proximal Policy Optimization and its Dynamic Version for Sequence Generation
Yi-Lin TuanJinzhi ZhangYujia LiHung-yi Lee
2018-08-24
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits
Julian ZimmertYevgeny Seldin
2018-07-19
Gradient Band-based Adversarial Training for Generalized Attack Immunity of A3C Path Finding
Tong ChenWenjia NiuYingxiao XiangXiaoxuan BaiJiqiang LiuZhen HanGang Li
2018-07-18
Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization
| Xiangxiang Chu
2018-07-02
Supervised Policy Update for Deep Reinforcement Learning
| Quan VuongYiming ZhangKeith W. Ross
2018-05-29
Crawling in Rogue's dungeons with (partitioned) A3C
| Andrea AspertiDaniele CortesiFrancesco Sovrano
2018-04-23
An Adaptive Clipping Approach for Proximal Policy Optimization
Gang ChenYiming PengMengjie Zhang
2018-04-17
A Brandom-ian view of Reinforcement Learning towards strong-AI
Atrisha Sarkar
2018-03-07
Variational Inference for Policy Gradient
| Tianbing Xu
2018-02-21
Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces
Gellért WeiszPaweł BudzianowskiPei-Hao SuMilica Gašić
2018-02-11
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
| Lasse EspeholtHubert SoyerRemi MunosKaren SimonyanVolodymir MnihTom WardYotam DoronVlad FiroiuTim HarleyIain DunningShane LeggKoray Kavukcuoglu
2018-02-05
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations
Xiaoqin ZhangHuimin Ma
2018-01-31
An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients
Jiaming SongYuhuai Wu
2018-01-17
Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design
Daniel NeilMarwin SeglerLaura GuaschMohamed AhmedDean PlumbleyMatthew SellwoodNathan Brown
2018-01-01
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
| Felipe Petroski SuchVashisht MadhavanEdoardo ContiJoel LehmanKenneth O. StanleyJeff Clune
2017-12-18
Natural Value Approximators: Learning when to Trust Past Estimates
Zhongwen XuJoseph ModayilHado P. Van HasseltAndre BarretoDavid SilverTom Schaul
2017-12-01
Teaching a Machine to Read Maps with Deep Reinforcement Learning
Gino BrunnerOliver RichterYuyi WangRoger Wattenhofer
2017-11-20
AMBER: Adaptive Multi-Batch Experience Replay for Continuous Action Control
Seungyul HanYoungchul Sung
2017-10-12
Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
Kyungjae LeeSungjoon ChoiSonghwai Oh
2017-09-19
Improving Search through A3C Reinforcement Learning based Conversational Agent
Milan AggarwalAarushi AroraShagun SodhaniBalaji Krishnamurthy
2017-09-17
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
| Yuhuai WuElman MansimovShun LiaoRoger GrosseJimmy Ba
2017-08-17
DARLA: Improving Zero-Shot Transfer in Reinforcement Learning
| Irina HigginsArka PalAndrei A. RusuLoic MattheyChristopher P BurgessAlexander PritzelMatthew BotvinickCharles BlundellAlexander Lerchner
2017-07-26
Learning Transferable Architectures for Scalable Image Recognition
| Barret ZophVijay VasudevanJonathon ShlensQuoc V. Le
2017-07-21
Proximal Policy Optimization Algorithms
| John SchulmanFilip WolskiPrafulla DhariwalAlec RadfordOleg Klimov
2017-07-20
Noisy Networks for Exploration
| Meire FortunatoMohammad Gheshlaghi AzarBilal PiotJacob MenickIan OsbandAlex GravesVlad MnihRemi MunosDemis HassabisOlivier PietquinCharles BlundellShane Legg
2017-06-30
Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning
Sahil SharmaAravind SureshRahul RameshBalaraman Ravindran
2017-05-20
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
Nat DilokthanakulChristos KaplanisNick PawlowskiMurray Shanahan
2017-05-18
Equivalence Between Policy Gradients and Soft Q-Learning
John SchulmanXi ChenPieter Abbeel
2017-04-21
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
Audrunas GruslysWill DabneyMohammad Gheshlaghi AzarBilal PiotMarc BellemareRemi Munos
2017-04-15
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
Yen-Chen LinZhang-Wei HongYuan-Hong LiaoMeng-Li ShihMing-Yu LiuMin Sun
2017-03-08
Improving Policy Gradient by Exploring Under-appreciated Rewards
Ofir NachumMohammad NorouziDale Schuurmans
2016-11-28
Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU
| Mohammad BabaeizadehIuri FrosioStephen TyreeJason ClemonsJan Kautz
2016-11-18
Sample Efficient Actor-Critic with Experience Replay
| Ziyu WangVictor BapstNicolas HeessVolodymyr MnihRemi MunosKoray KavukcuogluNando de Freitas
2016-11-03
Asynchronous Methods for Deep Reinforcement Learning
| Volodymyr MnihAdrià Puigdomènech BadiaMehdi MirzaAlex GravesTimothy P. LillicrapTim HarleyDavid SilverKoray Kavukcuoglu
2016-02-04

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories