1 code implementation • 16 Oct 2023 • Charlie George, Andreas Stuhlmüller
Hallucination plagues even frontier LLMs--but how bad is it really for summarizing academic papers?
1 code implementation • 4 Jan 2023 • Justin Reppert, Ben Rachbach, Charlie George, Luke Stebbing, Jungwon Byun, Maggie Appleton, Andreas Stuhlmüller
We apply iterated decomposition to three real-world tasks and improve the accuracy of LM programs over less compositional baselines: describing the placebo used in a randomized controlled trial (25% to 65%), evaluating participant adherence to a medical intervention (53% to 70%), and answering NLP questions on the Qasper dataset (38% to 69%).