Search Results for author: Hassan Mansoor

Found 6 papers, 2 papers with code

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

no code implementations • 19 Mar 2024 • Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma

We propose a technique to transfer capabilities from LLMs to VLMs.

Ranked #1 on Chart Question Answering on ChartQA (using extra training data)

Chart Question Answering Optical Character Recognition (OCR)

Paper
Add Code

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

no code implementations • 15 Mar 2024 • Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Simral Chaudhary, Bowen Li, Saravanan Ganesh, Bill Byrne, Jessica Hoffmann, Hassan Mansoor, Wei Li, Abhinav Rastogi, Lucas Dixon

We investigate the setup of "Parameter Efficient Reinforcement Learning" (PERL), in which we perform reward model training and reinforcement learning using LoRA.

reinforcement-learning

Paper
Add Code

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

1 code implementation • 7 Feb 2024 • Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, Abhanshu Sharma

At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements.

Ranked #3 on Visual Question Answering (VQA) on InfographicVQA (using extra training data)

Chart Question Answering Language Modelling +1

Paper
Code

LLMs cannot find reasoning errors, but can correct them!

1 code implementation • 14 Nov 2023 • Gladys Tyen, Hassan Mansoor, Victor Cărbune, Peter Chen, Tony Mak

While self-correction has shown promise in improving LLM outputs in terms of style and quality (e. g. Chen et al., 2023; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023).

Paper
Code

The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

no code implementations • 2 Nov 2023 • Sian Gooding, Hassan Mansoor

As a result, the task of text summarization has been identified as a good candidate for this process.

Text Generation Text Summarization

Paper
Add Code

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

no code implementations • 1 Sep 2023 • Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences.

Dialogue Generation reinforcement-learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.