Search Results for author: Sina Ghiassian

Found 15 papers, 5 papers with code

On the Importance of Uncertainty in Decision-Making with Large Language Models

no code implementations3 Apr 2024 Nicolò Felicioni, Lucas Maystre, Sina Ghiassian, Kamil Ciosek

We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy.

Decision Making Multi-Armed Bandits +1

In-context Exploration-Exploitation for Reinforcement Learning

no code implementations11 Mar 2024 Zhenwen Dai, Federico Tomasi, Sina Ghiassian

In-context learning is a promising approach for online policy learning of offline reinforcement learning (RL) methods, which can be achieved at inference time without gradient optimization.

Bayesian Inference Bayesian Optimization +3

Auxiliary task discovery through generate-and-test

no code implementations25 Oct 2022 Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard Sutton, Jun Luo, Adam White

In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning.

Meta-Learning Representation Learning

Importance Sampling Placement in Off-Policy Temporal-Difference Methods

no code implementations18 Mar 2022 Eric Graves, Sina Ghiassian

A central challenge to applying many off-policy reinforcement learning algorithms to real world problems is the variance introduced by importance sampling.

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

no code implementations10 Sep 2021 Sina Ghiassian, Richard S. Sutton

In the Rooms task, the product of importance sampling ratios can be as large as $2^{14}$ and can sometimes be two.

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

2 code implementations2 Jun 2021 Sina Ghiassian, Richard S. Sutton

In the middle tier, the five Gradient-TD algorithms and Off-policy TD($\lambda$) were more sensitive to the bootstrapping parameter.

Does the Adam Optimizer Exacerbate Catastrophic Forgetting?

1 code implementation15 Feb 2021 Dylan R. Ashley, Sina Ghiassian, Richard S. Sutton

Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon.

reinforcement-learning Reinforcement Learning (RL)

From Eye-blinks to State Construction: Diagnostic Benchmarks for Online Representation Learning

1 code implementation9 Nov 2020 Banafsheh Rafiee, Zaheer Abbas, Sina Ghiassian, Raksha Kumaraswamy, Richard Sutton, Elliot Ludvig, Adam White

We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning.

Continual Learning Representation Learning

Gradient Temporal-Difference Learning with Regularized Corrections

1 code implementation ICML 2020 Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.

Q-Learning

Improving Performance in Reinforcement Learning by Breaking Generalization in Neural Networks

no code implementations16 Mar 2020 Sina Ghiassian, Banafsheh Rafiee, Yat Long Lo, Adam White

Unfortunately, the performance of deep reinforcement learning systems is sensitive to hyper-parameter settings and architecture choices.

reinforcement-learning Reinforcement Learning (RL)

Overcoming Catastrophic Interference in Online Reinforcement Learning with Dynamic Self-Organizing Maps

no code implementations29 Oct 2019 Yat Long Lo, Sina Ghiassian

Yet, neural networks tend to forget what they learned in the past, especially when they learn online and fully incrementally, a setting in which the weights are updated after each sample is received and the sample is then discarded.

reinforcement-learning Reinforcement Learning (RL)

Should All Temporal Difference Learning Use Emphasis?

1 code implementation1 Mar 2019 Xiang Gu, Sina Ghiassian, Richard S. Sutton

ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training.

Online Off-policy Prediction

no code implementations6 Nov 2018 Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.

A First Empirical Study of Emphatic Temporal Difference Learning

no code implementations11 May 2017 Sina Ghiassian, Banafsheh Rafiee, Richard S. Sutton

In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem.

Cannot find the paper you are looking for? You can Submit a new open access paper.