Search Results for author: Sushant Prakash

Found 14 papers, 4 papers with code

RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs

no code implementations6 Sep 2024 Jiaxing Wu, Lin Ning, Luyang Liu, Harrison Lee, Neo Wu, Chao Wang, Sushant Prakash, Shawn O'Banion, Bradley Green, Jun Xie

Existing pretrained LLMs may generate summaries that are concise but lack the necessary context for downstream tasks, hindering their utility in personalization systems.

User-LLM: Efficient LLM Contextualization with User Embeddings

no code implementations21 Feb 2024 Lin Ning, Luyang Liu, Jiaxing Wu, Neo Wu, Devora Berlowitz, Sushant Prakash, Bradley Green, Shawn O'Banion, Jun Xie

We integrate these user embeddings with LLMs through cross-attention, enabling LLMs to dynamically adapt their responses based on the context of a user's past actions and preferences.

Self-Supervised Learning

Towards Accurate Differential Diagnosis with Large Language Models

no code implementations30 Nov 2023 Daniel McDuff, Mike Schaekermann, Tao Tu, Anil Palepu, Amy Wang, Jake Garrison, Karan Singhal, Yash Sharma, Shekoofeh Azizi, Kavita Kulkarni, Le Hou, Yong Cheng, Yun Liu, S Sara Mahdavi, Sushant Prakash, Anupam Pathak, Christopher Semturs, Shwetak Patel, Dale R Webster, Ewa Dominowska, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias, Jake Sunshine, Alan Karthikesalingam, Vivek Natarajan

Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51. 7%) compared to clinicians without its assistance (36. 1%) (McNemar's Test: 45. 7, p < 0. 01) and clinicians with search (44. 4%) (4. 75, p = 0. 03).

Diagnostic

Universal Self-Consistency for Large Language Model Generation

no code implementations29 Nov 2023 Xinyun Chen, Renat Aksitov, Uri Alon, Jie Ren, Kefan Xiao, Pengcheng Yin, Sushant Prakash, Charles Sutton, Xuezhi Wang, Denny Zhou

Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on various challenging tasks, by utilizing multiple reasoning paths sampled from large language models (LLMs).

Code Generation Language Modeling +5

RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

no code implementations1 Sep 2023 Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences, but gathering high-quality preference labels is expensive.

Dialogue Generation reinforcement-learning

Federated Reconstruction: Partially Local Federated Learning

3 code implementations NeurIPS 2021 Karan Singhal, Hakim Sidahmed, Zachary Garrett, Shanshan Wu, Keith Rush, Sushant Prakash

We also describe the successful deployment of this approach at scale for federated collaborative filtering in a mobile keyboard application.

Collaborative Filtering Federated Learning +1

Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions

no code implementations2 Mar 2020 Yitong Li, Dianqi Li, Sushant Prakash, Peng Wang

To improve the interpretability in the dual encoder models, we design a novel regularization loss to minimize the mutual information between unimportant words and desired labels, in addition to the original attention method, so that important words are emphasized while unimportant words are de-emphasized.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.