Search Results for author: Silviu Pitis

Found 11 papers, 8 papers with code

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

1 code implementation • 25 Sep 2023 • Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks.

Language Modelling valid

Paper
Code

Boosted Prompt Ensembles for Large Language Models

1 code implementation • 12 Apr 2023 • Silviu Pitis, Michael R. Zhang, Andrew Wang, Jimmy Ba

Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of language model reasoning performance with no additional training.

GSM8K Language Modelling

Paper
Code

Large Language Models Are Human-Level Prompt Engineers

2 code implementations • 3 Nov 2022 • Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers.

Few-Shot Learning In-Context Learning +3

999

Paper
Code

MoCoDA: Model-based Counterfactual Data Augmentation

1 code implementation • 20 Oct 2022 • Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg

To this end, we show that (1) known local structure in the environment transitions is sufficient for an exponential reduction in the sample complexity of training a dynamics model, and (2) a locally factored dynamics model provably generalizes out-of-distribution to unseen states and actions.

counterfactual Data Augmentation +2

Paper
Code

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

2 code implementations • ICML 2020 • Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba

What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks?

Multi-Goal Reinforcement Learning reinforcement-learning +1

103

Paper
Code

Counterfactual Data Augmentation using Locally Factored Dynamics

1 code implementation • NeurIPS 2020 • Silviu Pitis, Elliot Creager, Animesh Garg

Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses.

counterfactual Data Augmentation +5

103

Paper
Code

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

2 code implementations • ICLR 2020 • Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba

When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias.

Inductive Bias Metric Learning +3

Paper
Code

Objective Social Choice: Using Auxiliary Information to Improve Voting Outcomes

1 code implementation • 27 Jan 2020 • Silviu Pitis, Michael R. Zhang

Instead, we assume that votes are independent but not necessarily identically distributed and that our ensembling algorithm has access to certain auxiliary information related to the underlying model governing the noise in each vote.

Paper
Code

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

no code implementations • 9 Sep 2019 • Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps.

Q-Learning reinforcement-learning +1

Paper
Add Code

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

no code implementations • 8 Feb 2019 • Silviu Pitis

Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor $\gamma < 1$, or in episodic settings, with $\gamma = 1$.

Decision Making reinforcement-learning +1

Paper
Add Code

Source Traces for Temporal Difference Learning

no code implementations • 8 Feb 2019 • Silviu Pitis

This paper motivates and develops source traces for temporal difference (TD) learning in the tabular setting.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.