Search Results for author: Siddharth Verma

Found 7 papers, 2 papers with code

Suppressing Pink Elephants with Direct Principle Feedback

no code implementations12 Feb 2024 Louis Castricato, Nathan Lile, Suraj Anand, Hailey Schoelkopf, Siddharth Verma, Stella Biderman

Existing methods for controlling language models, such as RLHF and Constitutional AI, involve determining which LLM behaviors are desirable and training them into a language model.

Language Modelling

OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

no code implementations19 May 2023 Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab

Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations.

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

2 code implementations NAACL 2022 Siddharth Verma, Justin Fu, Mengjiao Yang, Sergey Levine

Conventionally, generation of natural language for dialogue agents may be viewed as a statistical learning problem: determine the patterns in human-provided data and generate appropriate responses with similar statistical properties.

Chatbot Offline RL +2

Continual Learning of Control Primitives : Skill Discovery via Reset-Games

no code implementations NeurIPS 2020 Kelvin Xu, Siddharth Verma, Chelsea Finn, Sergey Levine

First, in real world settings, when an agent attempts a tasks and fails, the environment must somehow "reset" so that the agent can attempt the task again.

Continual Learning

Continual Learning of Control Primitives: Skill Discovery via Reset-Games

1 code implementation10 Nov 2020 Kelvin Xu, Siddharth Verma, Chelsea Finn, Sergey Levine

Reinforcement learning has the potential to automate the acquisition of behavior in complex settings, but in order for it to be successfully deployed, a number of practical challenges must be addressed.

Continual Learning

Fast Online "Next Best Offers" using Deep Learning

no code implementations31 May 2019 Rekha Singhal, Gautam Shroff, Mukund Kumar, Sharod Roy, Sanket Kadarkar, Rupinder virk, Siddharth Verma, Vartika Tiwari

In this paper, we present iPrescribe, a scalable low-latency architecture for recommending 'next-best-offers' in an online setting.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.