Search Results for author: Silviu Pitis

Found 11 papers, 8 papers with code

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

1 code implementation25 Sep 2023 Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks.

Language Modelling valid

Boosted Prompt Ensembles for Large Language Models

1 code implementation12 Apr 2023 Silviu Pitis, Michael R. Zhang, Andrew Wang, Jimmy Ba

Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of language model reasoning performance with no additional training.

GSM8K Language Modelling

Large Language Models Are Human-Level Prompt Engineers

2 code implementations3 Nov 2022 Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers.

Few-Shot Learning In-Context Learning +3

MoCoDA: Model-based Counterfactual Data Augmentation

1 code implementation20 Oct 2022 Silviu Pitis, Elliot Creager, Ajay Mandlekar, Animesh Garg

To this end, we show that (1) known local structure in the environment transitions is sufficient for an exponential reduction in the sample complexity of training a dynamics model, and (2) a locally factored dynamics model provably generalizes out-of-distribution to unseen states and actions.

counterfactual Data Augmentation +2

Counterfactual Data Augmentation using Locally Factored Dynamics

1 code implementation NeurIPS 2020 Silviu Pitis, Elliot Creager, Animesh Garg

Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses.

counterfactual Data Augmentation +5

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

2 code implementations ICLR 2020 Silviu Pitis, Harris Chan, Kiarash Jamali, Jimmy Ba

When defining distances, the triangle inequality has proven to be a useful constraint, both theoretically--to prove convergence and optimality guarantees--and empirically--as an inductive bias.

Inductive Bias Metric Learning +3

Objective Social Choice: Using Auxiliary Information to Improve Voting Outcomes

1 code implementation27 Jan 2020 Silviu Pitis, Michael R. Zhang

Instead, we assume that votes are independent but not necessarily identically distributed and that our ensembling algorithm has access to certain auxiliary information related to the underlying model governing the noise in each vote.

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

no code implementations9 Sep 2019 Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps.

Q-Learning reinforcement-learning +1

Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach

no code implementations8 Feb 2019 Silviu Pitis

Reinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor $\gamma < 1$, or in episodic settings, with $\gamma = 1$.

Decision Making reinforcement-learning +1

Source Traces for Temporal Difference Learning

no code implementations8 Feb 2019 Silviu Pitis

This paper motivates and develops source traces for temporal difference (TD) learning in the tabular setting.

Cannot find the paper you are looking for? You can Submit a new open access paper.