2 code implementations • 20 Nov 2023 • Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen
We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2, 400).
Ranked #1 on How to refund a wrong transaction in PhonePe on How to refund a wrong transaction in PhonePe (using extra training data)
How to refund a wrong transaction in PhonePe Question Answering +2
no code implementations • 19 Jul 2022 • Pranab Islam, Shaan Khosla, Arthur Lok, Mudit Saxena
We explore an array of model bagging configurations for natural language understanding tasks with final ensemble sizes ranging from 300M parameters to 1. 5B parameters and determine that our ensembling methods are at best roughly equivalent to single LM baselines.