Search Results for author: Rohan Subramani

Found 2 papers, 1 papers with code

Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains

1 code implementation • 13 Nov 2023 • Joshua Clymer, Garrett Baker, Rohan Subramani, Sam Wang

As AI systems become more intelligent and their behavior becomes more challenging to assess, they may learn to game the flaws of human feedback instead of genuinely striving to follow instructions; however, this risk can be mitigated by controlling how LLMs generalize human feedback to situations where it is unreliable.

Instruction Following

Paper
Code

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

no code implementations • 18 Oct 2023 • Rohan Subramani, Marcus Williams, Max Heitmann, Halfdan Holm, Charlie Griffin, Joar Skalse

However, it is well-known that certain tasks cannot be expressed by means of an objective in the Markov rewards formalism, motivating the study of alternative objective-specification formalisms in RL such as Linear Temporal Logic and Multi-Objective Reinforcement Learning.

Multi-Objective Reinforcement Learning reinforcement-learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.