no code implementations • 12 Feb 2024 • Louis Castricato, Nathan Lile, Suraj Anand, Hailey Schoelkopf, Siddharth Verma, Stella Biderman
Existing methods for controlling language models, such as RLHF and Constitutional AI, involve determining which LLM behaviors are desirable and training them into a language model.
no code implementations • 19 May 2023 • Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab
Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanations, and OPT-RE, finetuned with explanations.
no code implementations • 10 Dec 2022 • Siddharth Verma, Yuchen Lu, Rui Hou, Hanchao Yu, Nicolas Ballas, Madian Khabsa, Amjad Almahairi
Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretraining.
2 code implementations • NAACL 2022 • Siddharth Verma, Justin Fu, Mengjiao Yang, Sergey Levine
Conventionally, generation of natural language for dialogue agents may be viewed as a statistical learning problem: determine the patterns in human-provided data and generate appropriate responses with similar statistical properties.
no code implementations • NeurIPS 2020 • Kelvin Xu, Siddharth Verma, Chelsea Finn, Sergey Levine
First, in real world settings, when an agent attempts a tasks and fails, the environment must somehow "reset" so that the agent can attempt the task again.
1 code implementation • 10 Nov 2020 • Kelvin Xu, Siddharth Verma, Chelsea Finn, Sergey Levine
Reinforcement learning has the potential to automate the acquisition of behavior in complex settings, but in order for it to be successfully deployed, a number of practical challenges must be addressed.
no code implementations • 31 May 2019 • Rekha Singhal, Gautam Shroff, Mukund Kumar, Sharod Roy, Sanket Kadarkar, Rupinder virk, Siddharth Verma, Vartika Tiwari
In this paper, we present iPrescribe, a scalable low-latency architecture for recommending 'next-best-offers' in an online setting.